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Abstract 

Two chapters of this thesis analyze expert consulting problems via game theoretic models; 
the hrst points out a close connection between the problem of consulting a set of experts and 
the problem of searching. The last chapter presents a solution to the dictionary problem of 
supporting SEARCH and update (INSERT and DELETE) operations on a set of key values. 

The hrst chapter shows that the problem of consulting experts on-line can be modeled 
by a chip game similar and in some cases identical to the Paul-Carole games used to model 
a faulty search process. It presents the best known worst-case algorithms for consulting 
finitely many experts, and the best possible algorithms for consulting infinitely many ex- 
perts (model selection) under some assumptions. It includes new results about faulty search 
processes as well as generalizations and new proofs of some known results. 

The second chapter uses properties of coalitional games to analyze the performance 
of the greedy heuristic for the problem of hiring experts from a pool of candidates using 
stochastic data. The results are instrumental in suggesting an alternative to a known 
algorithm for learning Lipschitz functions by a memory-based learning systems via an 
analysis of the greedy approximate solution of the s-median problem. 

The third and last chapter is dedicated to the Scapegoat trees data structure: a solution 
to the dictionary problem that uses binary trees with no auxiliary balancing data stored 
at the tree nodes to achieve logarithmic worst-case search time, and logarithmic amortized 
update time. 

All chapters explore alternatives to the now standard worst-case analysis of algorithms. 
The hrst chapter introduces and advocates the notions of opportunism and almost oppor- 
tunism of on-line algorithms. The second chapter contrasts the pessimism of worst-case 
analysis with the optimism of the greedy heuristic, and points out some benefits of explor- 
ing the latter. The last chapter evaluates a novel data structure by computing its amortized 
performance. 

Thesis Supervisor: Ronald L. Rivest 

Title: E.S. Webster Professor of Computer Science 
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Introduction 



This thesis addresses two problems - consulting a set of experts and searching. The problem 
of consulting a set of experts, or combining information from different sources to reach 
conclusions, is a most commonly-occurring problem. Many algorithmic tasks of producing 
a specific output based on a given input fit this description. 

Many, if not all human behaviors seem to be the result of processing incoming infor- 
mation from multiple sources. Turing's test suggests intelligence can be measured by its 
resemblance to human behavior. The hrst two chapters of this thesis evolved from research 
in the area of Computational Learning Theory [38], the area of theoretical computer science 
that is most closely related to research on artificial intelligence. 

We do not address the problem of consulting experts in its full generality. Rather we are 
concerned with two aspects of it: consulting experts on-line under worst-case assumptions 
about the input and selecting a "good" subset from a given pool of experts. 

The problem of searching for a particular named element within a given set is one of 
the fundamental problems of theoretical computer science [49]. This thesis (in chapter 
f) establishes a close connection between the problems of searching a set using unreliable 
information, and consulting experts on-line. 

The last chapter addresses another variant of the problem of searching, that of finding 
an element in a dynamically changing set. 

0.1 Chapter 1: Consulting a Set of Experts On-Line 
and Faulty Searching 

The hrst chapter defines and explores a class of multistage games that capture information 
theoretic aspects of on-line learning. Our framework encompasses both the problem of 
consulting finitely many experts and the problem of model selection from infinitely many 
candidates. We introduce the PM algorithms which achieve the game's value for some 
families of inputs and come within a constant multiplicative factor for others. Thus they 
provide simultaneous upper and lower bounds on the complexity of the problems addressed. 
Worst-case analysis of algorithms can be justified as a search for a solution that min- 
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imizes the risk involved. However, in applications the algorithm will often be faced with 
"easier" than the worst-case inputs. By introducing new notions of algorithmic complex- 
ity, opportunism and almost opportunism, we distinguish on-line algorithms which take full 
advantage of favorable situations, without unnecessary risks. 

Using this new notion of opportunism, we prove that our algorithms, unlike previous 
algorithms, are almost opportunistic in games with finitely many experts. This suggests 
that to achieve optimal on-line learning performance the manager has to gather information 
in rounds in which she does not err, as well as in rounds in which she errs. This conclusion 
is in contrast to the often indistinguishable asymptotic performance of learning algorithms 
that gather information in all rounds and those that gather it only in rounds in which the 
algorithm errs. 

We apply the PM algorithms to the previously unaddressed question of consulting 
experts over arbitrary finite decision domains of size > 2, and also allow the learner to 
incorporate a prior on experts' quality. 

The family of games discussed herein is closely related to the well-investigated Paul- 
Carole search games. In these games a searcher, Paul, tries to find a target value from a set 
of candidate values by questioning Carole. Carole is allowed to lie in some of her answers. 
It is shown that games in which the manager is evaluated on the number of mistakes he 
makes are reducible to games similar to the standard Paul-Carole search games in which 
the goals of the two sides are reversed, while expert consulting games in which the manager 
is evaluated on the number of mistakes he makes in excess of his best advisor or advisors 
are reducible to the standard Paul-Carole search games. Our analysis of these games allows 
a uniform derivation of generalizations of some known results. 

For decision makers our proof offers some insight into the folk wisdom asser that "the 
hardest decisions to make are the least important ones". 

The algorithms presented are named after a combinatorial entity they utilize, the Pascal 
Matrix. 
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0.2 Chapter 2: Greedy Expert Hiring and an Appli- 
cation 

The second chapter (based on Galperin [27]) addresses the problem of hiring a set of experts 
from a pool of candidates. Modeling this problem by a coalitional game, a uniform lower 
bound on the performance of the greedy heuristic for a family of games is derived. We 
show a uniform bound for this family also holds when only approximate rather than exact 
values of coalitions are known. One of the prettiest applications of this general analysis is 
to the s-median problem. 

Approximation algorithms for the s-median problem are a useful tool in learning Lips- 
chitz functions in the generalized PAC learning model of Haussler [34, 35]. To approximate 
a Lipschitz function a memory-based learning system can be used, as proposed by Lin and 
Vitter [42]. We generalize the analysis of a greedy approximate solution of the s-median 
problem hrst considered by Cornuejols et al. [21]. We then compare its performance to 
the performance of Lin and Vitter's linear programming approximate solution of the same 
problem as a tool in the construction of memory-based learning systems. We find the 
greedy approximation is simpler, more efficient and in many cases yields a smaller system. 

0.3 Chapter 3: Searching a Dynamically Changing 
Set 

The last chapter (based on Galperin and Rivest [28]) is dedicated to the problem of support- 
ing searches of a dynamically changing set of keys. An algorithm for maintaining binary 
search trees is presented. The amortized complexity per INSERT or DELETE is O(log n) 
while the worst-case cost of a SEARCH is O(logn). 

Scapegoat trees, unlike most balanced-tree schemes, do not require keeping extra data 
(e.g. "colors" or "weights") in the tree nodes. Each node in the tree contains only a key 
value and pointers to its two children. Associated with the root of the whole tree are the 
only two extra values needed by the scapegoat scheme: the number of nodes in the whole 
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tree, and the maximum number of nodes in the tree since the tree was last completely 
rebuilt. 

In a scapegoat tree a typical rebalancing operation begins at a leaf, and successively 
examines higher ancestors until a node - the scapegoat - is found that is so unbalanced 
that the entire subtree rooted at the scapegoat can be rebuilt at zero cost, in an amortized 
sense. 

Like the algorithms of the previous section, the algorithm for maintaining scapegoat 
trees enjoys the benefit of simplicity. 
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Chapter 1 

Opportunistic Algorithms for Expert 
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1.1 Introduction 

The problem of consulting a set of experts is a "master problem" that encompasses many 
other problems. In some cases we can ask how do humans generate their behavior based 
on sensory input, or how can a computer produce a desired output as a function of the 
boolean "advice" given by its input bits. 

Here we explore the problem of consulting on-line a set of experts providing boolean 
advice to a manager who has to reach a boolean decision. It can be described as follows: 
A manager and a set of experts are presented a common "yes/no" question. The experts 
advise the manager on the correct reply, then she makes a decision, after which the correct 
reply is revealed. We explore the worst-case performance of the manager, and hence assume 
that the experts' votes as well as the correct reply are chosen by an adversary. This is 
repeated in a sequence of rounds. The manager's aim is to stay not too far behind the best 
advisors, when all are evaluated by the number of mistakes they made on the sequence. 
More cannot be hoped for in the worst-case, as no a priori assumptions are made about 
the experts or about the problem domain. 

The problem of algorithmically consulting a finite set of experts on-line has been investi- 
gated under a variety of assumptions. (The on-line problem is distinguished by the manager 
having to reach a decision in every round, before seeing all inputs.) Littlestone and War- 
muth's [44] WM (Weighted Majority) algorithm and later the BW (Binomial Weighting) 
algorithm by Cesa-Bianchi et al. [18] address the question of consulting a finite number of 
experts that provide boolean advice. The prediction domain of either the manager or the 
experts may be modified to be the real interval [0, 1] as in Cesa-Bianchi et al. [17], Little- 
stone and Warmuth [44], Haussler et al. [32] and Vovk [70]. For this prediction domain 
various loss functions may be considered (Vovk [70], Haussler et al. [32]). 

We generalize the problem of consulting finitely many experts by considering arbitrary 
measurable sets of experts of finite measure, and letting the decision domain of the manager 
and her advisors be a finite set. We discuss optimal algorithms in the worst-case against a 
computationally unlimited adversary. This is the set-up commonly assumed in the analysis 
of algorithms for finitely many consultants. The PM (Pascal Matrix) algorithms can be 
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seen as a generalization of the Halving algorithm (Angluin [4], Barzdin and Freivalds [8]) 
to the case when the candidate predictors are allowed multiple errors. 

A zero-sum multi-stage game is a competition between two players. Popular examples 
include chess, checkers, backgammon. An on-line algorithm, one performing a multi-round 
"conversation" with the user (as opposed to an off-line algorithm that may be seen as 
answering unrelated questions) may be looked upon as the computer's strategy in a multi- 
stage game between the computer and its user, in which the computer is challenged to meet 
some performance criterion measured by the Cost function in the following. The optimal 
worst-case strategy a for the computer is one which for every state S that may be reached 
in the game and for every input sequence I incurs a cost no greater than 

inf sup Cost[a(5',/)]. (f.f) 

^Strategies /e I n puts 

The notion of state may be seen as capturing the relevant information about the history 
of an interaction between the computer and its user, as well as information known to the 
computer that is not part of this interaction. The reader is referred to Section 1.2 for 
formal definitions of the game theoretic terms. The inf sup Cost [&(S, I)] might not be a 
minsup Cost[cr(S', /)] - the latter might be unachievable. If the measure of performance is 
not the running time of the algorithm, the minsup strategy might not be computationally 
efficient, and an approximation might be necessary. Finding a minsup algorithm within 
a certain class of algorithms assures us this is the best possible worst-case algorithm in 
the class. An algorithm which is the minsup of all computationally unlimited randomized 
strategies we call an opportunistic algorithm. 

Consider the problem of analyzing the performance of health care procedures (e.g. 
drug administration) or alternatively stock market investment policies. The worst-case 
analysis of algorithms is based on often unrealistic "pessimistic" assumptions yet it may 
be well justified under such circumstances as a means of risk minimization. As the medical 
treatment proceeds, however, we would like the treatment algorithm to take advantage of 
developments which will typically be more favorable than the worst-case development the 
treatment plan is assuming a priori. The notions of opportunism and almost-opportunism 
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can facilitate the design of such algorithms. 

An algorithm A is said to be K almost opportunistic or (K, K') almost opportunistic if 
the cost of its execution starting with any reachable state S satisfies 

sup Cost^S, /) < K inf sup Cost[a(S,I)] + K' 

/elnputs ^Strategies /e i npu t s 

for all possible inputs. We derive an efficiently solvable equation that gives the value of 
states in some expert consulting games and upper bounds this value in other games. Using 
this formula, we formulate algorithms which are opportunistic for some game classes and 
almost opportunistic for others. 

The research efforts of theoreticians are dedicated to generating and evaluating algorith- 
mic solutions. The most commonly used measure of algorithms' quality is their asymptotic 
performance. Yet this measure is not sensitive enough to separate the LRU and FIFO pag- 
ing algorithms from LFU and LIFO although they are not equivalent in practice. Sleator 
and Tarjan [65] used competitivity to better compare them theoretically. The competitive 
ratio of an on-line algorithm A is said to be K if constant K, K' exist for which 

sup CostA(S ,I)<K inf sup Cost[cr(5'o, /)] + K' 

/elnputs aeOff-strategies / e I np uts 

where the inhmum is taken over some set of off-line strategies, and the comparison of 
performance is carried out only for the initial state Sq at which the game begins. While 
PM is 2 almost opportunistic for games with finitely many experts the best algorithm 
known currently for this problem, BW, is not. However, the bounds on their asymptotic as 
well as competitive performance are identical. Thus, almost opportunism is a complexity 
measure that in some cases is more sensitive than other known measures of algorithms' 
quality. 

One of the main motivations for the exploration of lower bounds lies in the fact that 
through establishing lower bounds for algorithmic problems we find out how far known 
solutions for a certain problem are from its "optimal" solution. Opportunism and compet- 
itivity are more precise indicators of this "closeness" than the commonly used comparison 
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of asymptotic lower and upper bounds since they do not neglect constants. However, 
reaching a competitive ratio of one for a problem is an unrealistic goal in most cases, as 
familiarity with future inputs seems to be indispensable information for the optimization 
of an algorithm. Proving an algorithm is opportunistic within some set of algorithms is the 
most realistic yet precise method at our disposal of stating an on-line algorithm cannot be 
improved upon in the worst-case. 

The superiority of the PM algorithms offers evidence in favor of updating expert weights 
in every round of the game, rather than only in rounds in which the manager errs. We 
prove the PM algorithms achieve similar bounds when the decision domain is a finite 
set of arbitrary size. This generalizes the problem previously addressed by Littlestone 
and Warmuth [44] and Cesa-Bianchi et al. [18] of decision domains of size two ("yes/no" 
questions). The PM algorithms can also be applied to tracing "good" sets of experts of 
arbitrary size > 1, if such are known to exist, and allow the manager to incorporate a 
non-uniform prior on experts' quality. 

Expert Games explored herein are closely related to, and for some variants reducible 
to, the well investigated (Rivest et al. [61], Pelc [60], Aslam and Dhagat [6], Spencer 
and Winkler [67], Aslam [5]) Paul-Carole chip games modeling a faulty search process. 
Our analysis leads to an extension of results by Rivest et al. [61] to cover searchers that 
incorporate an arbitrary prior on the values searched. It yields the value of the continuous 
Mistake Bound game (Version A in Spencer and Winkler's [67]), implying a lower bound on 
the performance of Paul in the discrete game. It yields the same necessary and a different 
sufficient condition for Paul's victory than those specified by Spencer [66]. 

The heart of our proof method consists of establishing that the expert consulting prob- 
lem can be modeled by a chip game (see e.g. Aslam and Dhagat [6]) in which the chooser 
tries to lengthen the game. We proceed to prove strong results about non-atomic expert 
consulting chip games. These imply somewhat weaker results for the interesting class of 
discrete games - atomic games that represent consulting finitely many experts. Our results 
are similar to those established by Cesa-Bianchi et al. [18] for consulting finitely many ex- 
perts and those by e.g. Spencer [66] for Paul-Carole games, showing the close relationship 
between these two problems. 
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It is said that "the hardest decisions in life are the least important ones". This principle 
is accounted for by the observation that a decision is difficult to make when the options 
presented are of similar utility. Our proof suggests that such difficult to make decisions are 
hard in yet another way. It points out that in when the options are hard to differentiate 
the decision maker may be facing an adversarial choice. 

Section 1.2 includes a few definitions and conventions. Section 1.3 begins with some facts 
about multi stage games, and moves on to define expert consulting games. In Section 1.4 
we make observations about the adversary's optimal strategies. Then Section 1.5 uses 
these to derive the values of some games. Knowing these values allows a formulation of 
algorithms for the expert manager. Section 1.6 discusses the efficient implementation of 
these algorithms, and their absolute performance. Section 1.7 compares the PM algorithms 
to the already known BW and WM algorithms for the same problem and extends them 
to decision domains of size > 2. It also addresses the implications of our results for the 
analysis faulty search processes. 

1.2 Definitions, Notations, Conventions 

Notation 1.2.1 Denote by 

a ;§> n 

the result of shifting vector a right n positions and shifting in n zeros on the left. E.g. 

(0,0,1,2,3) >2 = (0,0,0,0,1,2,3). 

Denote by 

a >lp n 

the result of shifting vector a right while preserving its length. E.g. 

(0,0,1,2,3) > LP 2 = (0,0,0,0,1). 

Occasionally, as implied by the context, we may use ^> to refer to shifts that preserve 
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vectors' lengths. 

Similarly <C, <Clp denote left shifts. 

The shift operation can be applied to matrices as well, row-wise. When a matrix is 
shifted the reader should assume its size is not changed unless specified otherwise explicitly. 
E.g. 



/ 1 ^ 
2 15 
12 10 
12 1 



<1 



/ ^ 
15 
2 10 
12 10 



When we multiply matrices by vectors we use the convention that when a matrix is 
multiplied by a row vector on the right the vector should be transposed 



1 1 ^ 

2 1 
1 2 1 



•(0,1,2) 



f 1 ^ 




(o\ 


2 1 




1 


{ 1 2 1, 




v 2 > 



The convention for priority of operators is : for vectors a, b and a scalar a 

6 + cra>l = 6 + ((aa) > 1). 

Notation 1.2.2 For two vectors a, b of the same length, if all the coordinates of vector a 
are greater or equal to the corresponding coordinates of vector b, i.e. \/i . a^ > b{, then this 
is denoted 

a > b. 

Definition 1.2.1 A measure tt on a non-empty set X is called non-atomic if for every 
set T C X and every positive real number c < 1 there exists a set T" C T such that 
7r(T') = c7r(T). Otherwise, tt is called atomic. 
Notation 1.2.3 



m 
<k 



E 

8 = 



m 
i 
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Notation 1.2.4 

1 J = (1 1 _^,0,...,0). 

3 

The dimension of lj is stated or implied by the context when this notation is used. 

For a set X we denote X* = {f : J\f — > X} } J\f denoting the natural numbers; and 
denote !"R + = [0, oo). 

Definition 1.2.2 The 1-shift is a relation on (!"R + )* ; that we denote by >si- For two 
vectors a, b £ (!"R + )* the relation a >s\ b holds iff there exist vectors 2/1,2/2 £ (3£ + )* such 
that 

2/i + V2 = a 

2/i + 2/2 > 1 = b. 

Definition 1.2.3 The shift order is a partial order on (!"R + )* denoted >s- For two vectors 
a, b £ (!"R + )* the relation a >s b holds iff there exist m > vectors j/i, . . . , y m £ (!"R + )* such 
that 

a >si 2/i >5i • • • >si Vm > b. 

The shift order is the transitive closure of the 1-shift relation. 

Spencer and Winkler [67] use an alternative equivalent definition. For two vectors 
a, b £ (!"R + )*, the relation a >s b holds iff for all i 



J2 a 3 > J2 b 3- 



(Add trailing zeroes as needed.) 

1.3 The Math of the Game 

This section quotes some definitions and facts from Game Theory. It defines the notions 
of an opportunistic and almost opportunistic strategies. Then it describes the particulars 
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of expert consulting games and defines them formally. These are the subject of discussion 
in what follows. 

1.3.1 Multistage Two-Person Games 

A zero-sum multistage two-person game [56] is a seven-tuple G = (<Si, S2, <Sx, So, c, Mi, M 2 ). 
The set S = S\ U <S 2 U Sj is the set of states of game G. The set Sj specifies the terminal 
states, those at which the game terminates. Other states we call non-terminal. State 
Sq G <Si U <S 2 is the game's starting state. Depending on whether So G <Si or So G <S 2 either 
the hrst or second player makes the hrst move. The value function c : Sj — > 3£ determines 
the value of terminal states for the hrst player. Intuitively, this is the "sum" the second 
player pays the hrst when a game ends. We often look at games from the second player's 
perspective calling the same function the second player's cost, hence the c notation. We 
may also refer to it as specifying the players' payoffs. The legal moves of the respective 
players in each state are specified by the functions M 8 - : Si — > 2 <S3_8 ' j ' St /{0}. We denote by 
2 the set of subsets of set T . We only refer to zero-sum two-person games which we may 
simply call multistage games in the following. 

A game starts at state So and proceeds in rounds. In each round one of the players 
alternately makes a move which modifies the game's state. A move is selected by a player 
from the set of legal moves available to him that is specified by his Mi function. It depends 
on the game's state at the beginning of the round. 

A state is reachable if some game play arrives at that state. A pure strategy for a player 
specifies a legal move for each state reachable from the start state So- Formally, we may 
define a reachable-set-strategy pair (iS^cr 1 ), where a 1 : S a i — > Mi(s) is a strategy, and 
S a i is the set of states that may be reached when the hrst player plays a 1 against some (at 
least one) strategy of the second player. A mixed strategy specifies a probability distribution 
over legal moves for all reachable states. 

For, possibly mixed, strategies <Ti, <t 2 of the respective players denote by V(G|cri, <r 2 ) the 
expected payoffs for the players when these strategies are played. The first player's value 
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of game G with respect to Ei,E 2 is 

V 1 (G|E 1 ,E 2 )= sup inf V(G|<7i,<7 2 ). 
The second player's value of game G with respect to Ei,E 2 is 

V 2 (G'|E 1 ,E 2 )= inf sup V(G|<7i,<7 2 ). 

A game G is said to have a value with respect to strategy sets Ei, E 2 if both players' values 
of G are equal 

V 1 (G'|E 1 ,E 2 ) = V 2 (G'|E 1 ,E 2 ). 

This common value we then call the value of G with respect to Ei,E 2 and denote V(G|Ei, E 2 ). 
When we neglect to mention relative strategy sets the default sets are the full sets of players' 
strategies. 

A player's strategy is optimal under pessimistic assumptions if for any reachable game 
state it performs no worse than any other strategy of the player in the worst-case, i.e. 
against the opponent's most unfavorable strategy. If they exist, we call such strategies 
minimax strategies. Formally, o\ is a minimax strategy of the first player with respect to 
Ei, E 2 if it satisfies 

inf V(G'|a 1 ,a 2 ) = V 1 (G'|E 1 ,E 2 ). 

(T2GS2 

Similarly, o° 2 is a minimax strategy of the second player with respect to Ei, E 2 if it satisfies 

sup V(G'|a 1 ,a 2 ) = V 2 (G'|E 1 ,E 2 ). 

A finite game, one which is guaranteed to terminate in a finite number of moves and in 
which the set of moves available to the players in each state is finite, is guaranteed by the 
famed Minimax Theorem [56] to have a value V(G) £ 9ftU {oo, —00} when mixed strategies 
are allowed. Both players in such a game have, possibly mixed, minimax strategies. Yet if 
pure strategies guarantee the game's value, then neither player can improve upon them by 
using mixed strategies. Such games can be played optimally under worst-case assumptions 
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without resorting to randomization. 

A computable strategy may be called an algorithm. 

For an arbitrary state S G S denote by G(S) a game identical to G except for the 
starting state which is S. Let Ei,E 2 denote strategy sets of the respective players. For a 
strategy a 1 let S^a 1 , E 2 ) denote the states S G S that are reachable when the hrst player 
plays a 1 and the second player plays a strategy in E 2 . If for any S G S^a 1 , E 2 ) strategy a 1 
guarantees the hrst player a payoff in game G(S) that is as high as that guaranteed by any 
other strategy in the set Ei when the second player is restricted to strategies in E 2 we call 
strategy a 1 opportunistic with respect to (Ei,E 2 ) in game G. Formally, a 1 is opportunistic 
with respect to Ei, E 2 if 

inf V(G(S)\a\a 2 )= sup inf V(G(S)Ka 2 ). 

C2GS2 (TiGSl <J 2GS2 

Similarly, a strategy of the second player a 2 G E 2 is opportunistic with respect to Ei,E 2 

in game G if 

sup V(G(S)\(Ti, a 2 ) = sup inf V(G(S)\(Ti, <t 2 ). 
ctiGEi (TiGSi <^2es 2 

We call a strategy opportunistic in game G when it is opportunistic with respect to the full 

sets of strategies available to the players. A strategy that is an algorithm, may be called 

an opportunistic algorithm. 

Let So, Si, S 2 , . . . , Sk be the history of a game-play between the two players. Strategy 
a 1 of the hrst player is opportunistic with respect to Ei, E 2 if for any history that is possible 
when the hrst player plays a 1 and the second player is restricted to strategies in E 2 the 
sequence (V 1 (G'(S 8 ')|Ei, E 2 ))*_ is monotonic non-decreasing. A strategy a 2 of the second 
player is opportunistic if for all histories satisfying analogous conditions the sequence of 
values(V 2 (G'(S 8 ')|Ei, E 2 )) 8=0 is monotonic non-decreasing. 

A strategy a 1 G Ei is said to be (K,K') almost opportunistic with respect to (Ei,E 2 ) 
in game G or K almost opportunistic with respect to (Ei, E 2 ) in game G if there exist fixed 
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real constants K, K' G 3? such that for any reachable state S G ^(c 1 , S 



2;, 



inf V^S)^ 1 ,^) >A sup inf V(G(S)\a 1}( j 2 ) + K'. 

C2GS2 (TiGSl <J 2GS2 

Similarly, a strategy cr 2 G E 2 (K,K') almost opportunistic with respect to (Ei,E 2 ) m game 
G or A^ almost opportunistic with respect to (Ei,E 2 ) in game G if there exist fixed real 
constants K, K' G 9£ such that for any reachable state S G ^(c 1 , S 2 ), 

sup V^SOKa 2 ) < K sup inf V(G'(5')|a 1 ,a 2 ) + A / . 

ctiGEi (TiGSi <^2eS 2 

We call a strategy almost opportunistic in G when it is almost opportunistic with respect 
to the full sets of strategies available to the players. A strategy that is an algorithm, may 
be called an almost opportunistic algorithm. 

1.3.2 Expert Consulting Games - Common Rules, Definitions 

In the game we address the manager, called Alice, consults a set X of experts. A probability 
measure, 7r, is defined on X - that is vr(A) = 1. We are interested in the worst-case analysis 
of Alice's algorithm and assume the experts are managed by an adversary. We consider 
those set-ups in which the advice the adversary is allowed to give through an expert at some 
point in the game is limited only by the count of mistakes that expert made in previous 
rounds. In other expert consulting set-ups limitations placed by the game's rules on the 
adversary's choice of an expert's advice may be limited not only by the number of mistakes 
each expert made, but also the rounds in which those mistakes were made. The state of 
the game after round i is represented by a triple 

Si = (A 8 ,z,e 8 ), 

where Ei G Af is the number of errors made by Alice (zero is considered a natural number); 
i G Af is the number of rounds played in the game; e 8 G [0, 1]* the expert state vector. The 
coordinates of e 8 are real numbers that sum to one. Coordinate e % - is the measure of the 
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set of experts that made j mistakes in the hrst i rounds. 

The initial state of the game may be So = (0, 0, (1)). This state represents the fact that 
all experts have accumulated zero errors at the start of the game expressing no preference 
of Alice among them. The experts that have less accumulated errors are attributed greater 
weight by the algorithm. Initial state vectors other than (1) may be used to represent a 
prior Alice has on the experts' quality. For "experts" who are programs or functions, the 
prior may be used to give a preference to simpler models of the data. 

Denote by M l (x) the number of mistakes expert x is charged with after round z, and 
denote by A 8 the set of experts that are charged with j mistakes in the hrst i rounds of 
the game. The number of mistakes expert x is charged with is the sum of M°(x), which 
corresponds to Alice's prior on x's performance, plus the number of mistakes x made in 
the game. 

At round z, the adversary chooses vectors n l £ 3J* representing the weight of experts 
voting for option one (here we assume a boolean decision domain with two options 1,2) 
such that n l < e 8 . The other experts, whose weight may be represented by e — n, are 
assumed to vote for the other option. We call n l a split. Although game states represent 
only the error counts neglecting experts' identities, they may correspond to an underlying 
set of experts X. Thus a split n l corresponds to a set of experts voting for option one, 
X 1 ' 1 C X, such that n) = tt{x £ X 1 ' 1 : M l (x) = j}. 

Next Alice chooses her decision d{ £ {1,2}, after which the adversary reveals the correct 

answer. An absolute game's state vector is then updated according to the following rules: 
If the correct answer is "1" then: e 8+1 = o l,1 (n l ) = n l + (e 8 — n l ) >> 1 

If the correct answer is "2" then: e 8+1 = o l,2 (n l ) = e l — n l + e l >> 1. 
We call o 8,1 ,o 8 ' 2 the options Alice is presented with. 

Definition 1.3.1 We denote by Q xxx a family of expert consulting games. The various 
families are distinguished by the XXX superscript. A game G £ Q xxx is specified by 
a tuple of up to six coordinates (A, 7r, e°, rj } M } /); X is the set of experts; tt a probability 
measure on X; e° £ 3J* the expert start state satisfying J2 e ° = 1/ V '■ N ~ > [0,1] the 
share of "good" experts, those that make less than M(i) mistakes in the first i rounds; 
M : J\f — » Af the Mistake Bounding function; I £ J\f is the length of the game. When 
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the length is specified that means it is fixed a-priori independent of play. The score in 
a terminated absolute game is given by E\. We use G(e) to denote a game identical to 
G except, possibly, for its start state; G(e, 7r), G(e, M), etc. have similar semantics. We 
call Alice (the manager^ the player that attempts to minimize the score and her opponent 
the adversary. The notations X[G], vr[G], . . . are used to distinguish the coordinates of the 
game-defining tuple. 

Notation 1.3.1 Denote by A(G,V,S,1) the set of adversarial strategies against which 
Alice is expected to make at least V mistakes in game G of length I starting at state S for 
any strategy she can play. Similarly, A(G, V, S,l) is a set of Alice's strategies for which she 
is expected to make up to V mistakes in a game G of I steps starting at state S. 

1.3.3 Expert Consulting Games - Variations 

Having discussed the commonalities of expert consulting games, we now turn to their 
idiosyncrasies. 

Definition 1.3.2 We call game G non- atomic ifir[G] is non-atomic, a nd atomic otherwise. 
For an atomic game we call a game with identical parameters but a non-atomic measure 
the associated non- atomic game. 

If we restrict our attention to sets of experts of cardinality no greater than the continuum 
then according to a standard theorem [58, Proposition 26.2] all non-atomic probability 
measures that can be defined over the set of experts X[G] that agree with ff[G] are unique 
up to isomorphism, and isomorphic to the Lebesgue measure on the real segment [0,1]. 
Thus, up to isomorphism, there is a single non-atomic game associated with any given 
game. 

Definition 1.3.3 A game with N experts is a game for which N is the size of the set 
of players, \X\ = N ; all players are assigned the same weight by the measure function 
\/x G X . ir(x) = -^; and function rj effectively counts the number of non-erring experts 
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Families of games may vary by the knowledge the sides possess and by the amount 
of control they have in the game. E.g. both sides may know the length of the game in 
advance, just one side may know it in advance, or the power to stop the game may be given 
to one of the sides. 

We call an absolute game Mistake Bound (MB) when the adversary is required to ensure 
that 

e l • 1m W > r,(i). (1.2) 

is satisfied when the game terminates after some round i. That is, at least rj(i) of the 
experts make less than M(i) mistakes in the hrst i rounds of the game. One can also 
explore Prefix Mistake Bound (PMB) games in which the adversary is required to satisfy 
condition (1.2) after i rounds, for all i. 

Restrictions can be put on the ways in which experts' opinions interact. These can be 
deterministic or probabilistic. In particular splits may be selected by a stochastic process. 
Likewise, the correct labels may be selected by a stochastic process. To fit such games 
into the above framework, we can think of the adversary as being restricted to use the 
appropriate process to make his decisions. 

One can also let M{1) or rj(l) be a random variable, thus modeling a game in which 
these parameters are determined stochastically, although here this avenue is not explored. 

The parameters of games within a family, e.g. X, 7r, M (/), rj(l) may vary from game to 
game. 

Definition 1.3.4 We call game trees the class of binary trees the nodes of which are labeled 
by expert state vectors, such that the children of a node labeled e are labeled by two options 
o x (n),o 2 (n) ; with respect to some split n < e. For all d £ J\f the nodes at depth d satisfy 
the mistake bound condition (1.2) with i = d. 

A tree corresponds to a deterministic algorithm T> of the adversary if the split used to 
label the children of a node labeled e is the split T> chooses in state e. 
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1.4 Adversarial Logic 

The results of this section apply to arbitrary Mistake Bound games. They also hold for 
Prefix Mistake Bound games in which M{1) is monotonically non-decreasing, and rj(l) is 
monotonically non-increasing. Herein, we denote this class of games by Q. We show there 
are always strategies in A(G, V, S, I) that possess certain convenient properties. When 
presented with the task of making a decision about a move, Alice can assume w.l.o.g. the 
adversary is going to use one of these easily analyzable strategies to predict the outcome 
of his decision. 

1.4.1 Generally Applicable Observations 

Claim 1.4.1 For any game G £ Q and any V,S,l the set A(G, V, S, I) contains a pure 
strategy. 

The proof uses the fact that the set of maxima of a linear function on a polygon always 
contains a vertex of that polygon. The claim implies that restricting a computationally 
unlimited adversary to use pure strategies does not improve the game for Alice. 

Definition 1.4.1 We define strategy A for Alice by 

g^o(o\ ° 2 ) = ar S min{u 8 : v 1 = max{V y to(o 1 ), Va°{o 2 ) + 1 }, v 2 = max{V y to(o 1 ) + f , Va°{o 2 )}}. 

i=l,2 

(1.3) 
Claim 1.4.2 For any game G £ Q and any V, S, I 

A £ A(G, V, S, I) 

An Alice that knows the value of any state can choose her moves optimally, by predicting 
her opponent's choices. 

We can distinguish four types of deterministic adversarial moves in the split-choosing 
stage: 
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A-A Agreement splits. Splits for which the adversary will agree with whatever decision 
Alice makes. 

A-D Agreement-disagreement splits. Splits for which the adversary will agree with one 
decision of Alice, but disagree with the other. 

S Stall splits: n(e) = e or n(e) = 0' e L The adversary agrees with the decision of the 
experts. 

D-D Disagreement splits. For these the adversary will disagree with any decision Alice 
makes. 

Claim 1.4.3 For any game G £ Q , any start state S , and any game length I £ J\f 

Vv(G(S,l))>Vv(G(S,l-l)). 

Proof: The adversary can always start the game with a stall move. □ 

Intuitively, vectors that are smaller with respect to the shift order can be interpreted 
as corresponding to more evolved positions in the expert consulting game. Thus the next 
claim seems to follow naturally from the previous one. It allows us to say that the value of 
games respects the shift order. 

Claim 1.4.4 For any game G £ Q and two game states S 1 = (E } i } a) and S 2 = (E } i } c) 
such that a >s c: 

A(G, V, S 2 } J) ^ => A(G, V, S\ I) + 0. (1.4) 

Proof: Fix a strategy A for Alice. Calling the number of mistakes Alice makes the 
adversary's payoff, we prove that 

for any pure adversarial strategy T> there exists a pure strategy T>' such that 
the expected payoff of T>' against A starting with expert state vector a is at 
least as high as the expected payoff of T> against A starting with expert state 
vector c. 
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Then Claim 1.4.1 completes the proof. 

Note that only the adversary can benefit from prolonging the game. For vectors c 
satisfying the game termination condition the game cannot proceed and the claim holds. 
This establishes the base of an induction on the game's length /. 

Assume the inductive statement holds for I < L. For I = L, let T> £ A(G, V, S 2 ,l). Let 
n be the split that T> chooses in state c. Then by the definition of the shift order there 
exist vectors ai, a 2 , . . . a^ such that 

a> s c => a >si ai > 5 i • • • >si dk > c. 

By induction on k, as shown below, it follows that T>' can choose a split n' for state a 
such that both options presented to Alice by T>' in state a are greater than or equal to 
with respect to the shift order to those presented in state c by T> . Hence, the inductive 
assumption may be applied to these options completing the proof. 

For k = let n' = n . 

For the inductive step, take a,\ as above. It satisfies a ~>si «i, hence there exist J/ X ,J/ 2 
such that 

y 1 + y 2 = a 

y 1 + y 2 > 1 = ai. 

For a vector b < a,\ denote: 

bj = mm{bj,yj}, 
b' = 6+(6_6)<l. 

Now b < y 1 and b — b < a,\ — y 1 = y 2 3> 1, hence V < a making V a legal split of a. Further, 

a - ai = y 2 - y 2 > 1 > 5 (6 - b) - (b - b) > 1 = b' - b 
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hence a — b' >s a,\ — b this and b' >s b imply together that the two options offered when b' 
is presented in state a, namely a — b' + b' ^> 1 and (a — b') ^> 1 + V are respectively bigger 
with respect to the shift order than the options a,\ — b + b ^> 1 and (ai — b) ^> 1 + b. □ 
It follows that 

Corollary 1.4.1 For any game G G Q 

a> s c => V(G(a))>V(G(c)). (1.5) 

Next we show Claim 1.4.4 implies that when the adversary chooses to agree with Alice 
the adversary's state of affairs does not improve and may even deteriorate, provided Alice 
plays rationally. Thus the adversary's strategic perspectives are not harmed when he is 
restricted to always disagree with Alice. First observe that 

Corollary 1.4.2 For any game G G Q a non-empty set of /^.-strategies must contain a 
strategy that does not use agreement splits. 

Proof: Since stall splits are always available to the adversary, all agreement moves can 
be replaced by stall moves, without compromising the adversary's payoff. This follows from 
Claim 1.4.4, as agreement moves at least weakly decrease the state's value with respect to 
the shift order. □ 

Next we prove that A-D splits need not be used either. 

Claim 1.4.5 For any game G G Q and any V, S, I there is a A(G, V, S, I) -strategy that does 
not use A-D splits. 

Proof: By the definition of V : 

V(e,/|D, .) = max min{ max{V(o 1 ,/- 1|D, .) + l,V(o 2 ,/- 1|X>,.)}, 

max{V(o 1 ,/-l|D,.),V(o 2 ,/-l|D,.) + l} }. (1.6) 

Hence, for all splits n < e 

min{V(o\ / - 1\V, .), V(o 2 , l-l\V, .)} < V(e, l\V, .) - 1. (1.7) 
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If a given split satisfies 

min{V(o\ / - 1\V, .), V(o 2 , l-l\V, .)} = V(e, l\V, .) - 1 

then from the dehnition of the game's value w.l.o.g. the adversary can use it as a D-D 
split. This is preferable to using it as an A-D split. Thus a rationally chosen A-D split 
must satisfy 

min{V(o\ / - 1\V, .), V(o 2 , l-l\V, .)} < V(e, l\V, .) - 2. 

In fact by the dehnition of the game's value (1.6) it must satisfy 

min{V(o\/-l|D,.),V(o 2 ,/-l|D,.)} = V(e, /|D, .) - 2, 
max{V(o\/-l|D,.),V(o 2 ,/-l|D,.)} = V(e,/|D,.). 

The second equation follows from the hrst by the dehnition of the game's value. Equality 
holds by Claims 1.4.3 and 1.4.4. Assume w.l.o.g. 



V(o\/-l|D,.) = V(e,/|D,.). 



This means that Alice can decide 1 rationally leading to an agreement. By Claim 1.4.4 if 
the adversary presents the stall split n' = e 

max{V(o 1 , l-l\V, .), V(o 2 , / - 1|D, .) + 1} 

= max{V(e,/- l|D,.),V(e > 1,/- 1|D,.) + 1} 

= V(e,/|D,.). 
max{V(o\ / - 1|D, .) + 1, V(o 2 , 1 - 1\V, .)} 

= max{V(e,/- 1|D,.) + l,V(e > 1,/ — 1|X>, .)} 

= V(e,/|D,.) + 1. 

Alice still decides 1 rationally, leaving the game in state e. By Claim 1.4.4 e >s o 1 implies 
this move is no worse for the adversary. □ 
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Thus we have proven: 

Theorem 1.4.1 For any game G £ Q and any V, S, I there is a A(G, V, S, I) -strategy that 
performs only S and D-D moves. 

The adversary can determine the rounds of the game at which he executes the D-D 
moves. In a Mistake Bound Game, there is no reason to delay their execution. Thus a 
generic optimal adversarial strategy for MB games is to execute rationally justified D-D 
moves while available, and then stall to the end of the game. In particular if no D-D moves 
will become available in the future, the adversary and Alice can choose to stop the game 
without incurring a loss. 

In a general Prefix Mistake Bound game the optimal strategy of the adversary might 
need to take into account global considerations when deciding whether to execute a D-D or 
S move. Yet if M{1) is non-decreasing and rj(l) non-increasing D-D moves can be executed 
whenever they become available, as a legal game state cannot become illegal when the 
game advances. Alternatively, for arbitrary M(l),r)(l) the adversary can execute all his 
D-D moves consecutively at the last V(G) moves of the game. This can be proven by 
induction using the fact that a D-D move that is followed by an S move can be swapped 
with it without violating any mistake bound restrictions. 

Looking at game trees, we have so far argued that we only need to consider trees with 
D-D and S moves to compute the value of a game. The value of a game is given by the 
max trees rn ^ n leaves °^ ^ e nurn t ) er of D-D moves on the path to the leaf. 

1.4.2 Non- Atomic Games 

Pascal Matrices 

Definition 1.4.2 Denote by Pp a Pascal Matrix of size n X n, order I. The entry in row 
i and column j , for < i < n and < j < n of this matrix is 
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Comment: For c < and c > / we have 



0. 



We index rows and columns of matrices starting at 0. We may neglect to mention the 
size of the matrix, n, as it is often not essential in our application, provided n is sufficiently 
large (larger than M). 

Here, for example, are a couple 4x4 Pascal Matrices: 



P 4 



^ 1 ^ 
2 10 
12 10 
12 1 



P 4 



^ 1 ^ 

10 

10 

1 



For a vector a and any matrix Q 



(a> 1) = (Q < I) a. 



Since for an n X n matrix Q with Q = [qij\ and a = (ai, . . . , a n ) 



[qij] ■ ((a x , . . . , a n ) > 1) = (J2 9u«j-i)r=i = (Q < 1) «• 

j = 2 



Now observe that 



ip + ip<l = P +1 . 



1.8) 



It follows from the facts quoted above that: 



^(2 a +2 a>1 ) 



P -a + P (-a > 1) 
-Pa + -(P<l)a 



P 



/+i 



a. 
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Thus 



Pi(-a + -a->l) = P l+1 a. (1.9) 



An Optimal Adversarial Strategy for Non- Atomic Games 

We call an adversary's move a "half" move if it presents the D-D split |e at state e. 

Claim 1.4.6 If the adversary executes I "half" moves beginning with state e then the game 
moves into state Pie, regardless of Alice's strategy. 

Proof: By induction on / using equation (1.9). □ 

Notation 1.4.1 Denote byT>i the adversarial strategy that executes a "half" move in state 
e if that does not cause a violation of the mistake bound condition, and that stalls otherwise. 

Theorem 1.4.2 For non-atomic game G G Q if A(G, V, S, I) ^ then Vi G A(G, V, S, I). 

Proof: For a deterministic strategy T> of the adversary that executes D-D and stall moves 
only, let us look at the tree representing D's executions. We have argued that V(G\T> } .) > V 
is the minimal number of D-D moves between a depth / node and the root. Replacing D-D 
moves by stall moves on branches Alice does not take rationally does not cause a violation 
of the mistake bound condition (1.2), thus we can assume w.l.o.g. that on any path from 
the root to a depth / leaf there are V D-D moves. Now remove all nodes corresponding to 
stall moves from the tree. Assuming Alice votes with all of her consultants in such nodes, 
and that the adversary announces her and her consultants right we can attach the subtree 
rooted at the child that is labeled identically to the parent-node in place of the parent-node. 
Denote by e] 7 , . . . , e^v the nodes at depth V. By induction on V it can be shown that: 



Y j e[ = 2 v P v e 



where e is the start state, as for any two options at a state e 



o 1 + o 2 = e + e > 1. 
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For all i = l,...,2 y : 



e 8 • i-M > rj. 



Thus 

£>f • 1 M ) = (2 V P v e ) ■ 1 M < 2 v n. 

i 

By Claim 1.4.6 had the adversary played D\ he would have executed at least V successive 
"half" moves during the hrst V rounds of the game. The resulting state would have satisfied 
the mistake bound condition at termination securing him a payoff of V. □ 

1.4.3 Games with Finitely Many Experts 

In non-atomic games the adversary has more splits to choose from than in the corresponding 
atomic games. Thus 

Theorem 1.4.3 For two games G\^G 2 G Q identical, except for the measure function, 
such that tt[G\] is atomic and 7r[Cr2] is non-atomic 

Vv(G 1 )<Vv(G 2 ). 

Next we restrict our attention to games with finitely many experts. The adversary is 
not much worse off in these games than in the associated non-atomic games. 

Consulting Finitely Many Experts in Mistake Bound Games 

Notation 1.4.2 Denote by n a the split defined by lining up the experts in order of decreas- 
ing number of accumulated errors (ties are broken arbitrarily) and choosing into n a those 
experts at odd positions. Call it the alternating split. 

These splits were considered by Spencer and Winkler [67]. 

Corollary 1.5.1 of Section 1.5.1 states that the value of a non-atomic game with constant 
mistake bound M and adversely determined length is given by (the same notation is used 
in both places): 

V MB ' cM {G{e)) = arg maxjP, e • 1 M > n}. 
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in the remainder of this section we denote V(e,M,rj) = V MB,cM (G(e)). 

Theorem 1.4.4 For two Mistake Bound games G\^G 2 identical, except for the measure 
function, such that G\ is a game with finitely many experts and G 2 is non-atomic 

v v (g 1 )> \ l -v v {G 2 )-\. 

Proof: By Theorems 1.4.1 and 1.4.2 in the non-atomic game G 2 w.l.o.g. the adversary 
executes 

V = V(e, M, r,) = arg max{P, e • 1 M > r,} = V v {G 2 {e)) 

"half" moves, for some M, and then stalls until the game ends. Denote the number of 
errors the least erring expert made getting to state e by 

*LE{e) = argmin{e 8 ^ 0}. 

The argument now splits into two cases : 
Case 1 : [ 2(M — iLE(e) — 1) ^ V ] — In the hrst case the least erring expert is allowed 
to make M — iLE(e) ~ 1 additional errors. The adversary can use this to cause the manager 
to incur at least this many mistakes, without violating the mistake bound requirement. 
Thus for state vectors satisfying 2(M — iLE(e) ~ 1) ^ V the theorem holds. 
Case 2 : [ 2(M — iLE(e) ~ 1) < V — 1} — In Case 2 we want to prove the theorem by 
induction on V . For vectors covered by case 1 the base is established. Yet some states with 
V = 1 might not satisfy 2(Af — ZL_e(e) — 1) > V } i.e. ZL_e(e) = M — 1 that is yj e = eu-\- 
Since V = 1 the adversary can perform another "half" move, thus 

J2 e ~ 2 eM ~ l - J] - 

Since e is a state in a game with finitely many experts, ejj-i, r\ £ { jr : i G A/"}, implying 

^>- \-e M -i\ >rj. 
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Hence the adversary can play another D-D move in a game with finitely many experts as 
well. 

Now let us complete the inductive proof of case 2. 
Induction's step : Having established the base we prove for the step that for any e, M, N: 

V(e, M }V )-2< min{y(o 1 (n a ), M, rj), V(o 2 (n a ), M, rj)}. 

By the definition of V 

F<(«« V -o) >&, ' e -'-^ < uo) 

Denoting 

/ l \ ■ 

eLE = {jt} > ILE 

the bound 2(M — i^E — 1) < V — 1 implies that no more than half the terms in the sequence 

(v\ (v\ (v\ 

, , . . are summed up on the left-hand side of the following expression 

\ / \ 1 / \ V 



giving 



Thus by (f.fO): 



<<J-o F 1 -"-^ 



V I \M-1 ^ I I V | \M-1 



( ^ . l)fio-^<( ^ \)^-(e-e LE ). (1.11) 

I < (M - «) / \ < (M - «) / 



The ratio 

C'M":')-^ 

increases as c grows. Thus 

/ V" \ ( V \ 



< (M - i) I . . 1 \ < (M - i) / ,, 1 

v ' I \M—\ ^ / \ / \M—\ I \ 

■),-=o • e ^ > (-7 \-)«=o ' ( e ~ e LE) 



V - 1 \ / y-i 

<(M-i) \ < (M - i) 
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We get from (1.11): 

\<(M-i) J \<(M-i) J 

which is equivalent to 

M-i LE / T/ 1 \ M-i LE / T/ 1 \ 

j=i \M-i L E-3 ) j=1 \ M-i-j I 



The ratio 



decreases as c grows. Hence, 



r /.:, -^ 



v-i \ . A v-i ^ M _ x 



*, i ; " ( ,»„ • n )"o-( e -^). (1.12) 

M-l L E-l I \ (M - I - 1) / 



Now note that for any game state e 

2' x \^ < (m - i) y 2' \^ < (m - i) J 

i .( i-i \ A i \ ( i-i 



M-\ 



M-\ 



2' ( \ < (M - {) j { {< (M -{)) \< (M - i) ' )>,= ° 
2 l \<(M-i) J \ < (M - i - 1) 
2' \ M - i - 1 / 



e 



• e 



Hence 



£iV- 2 (oV) + oV)) = ]TP v _ 2 ( e + e >l) 

(by (1.9)) = J2 2P v-ie 

= ]T[2P ve + 2(Py-i - Pv)e] 
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(by the definition of V) > 2?/ + -J— (( V * ))fin 1 ■ e 

2 v ~ lX y M-i-1 J 

(by (1.12) > 2i; + ^( a J-;_ i ). (US, 



On the other hand 



£iV- 2 (oV)-°V))l = J=2<( J .\ ))^ 1 -(2n a -e). (1.14) 

£ \ M — I — 1 / 



The condition 2(M — i L E — 1) < V — 1 implies 



V-2 V-2 V-2 

max = maxj , }. 

«' \ M-i-1 \ M-i LE -l \ M-i LE -2 



Hence, for the sum of a sequence with alternating signs 



£p y _ 2 (oV)-«V))l < ^Hl _ ,)>[„. „ )} 



< 



1 / V-2 \ / V-2 

■rj—- max{ , 

2 K_2 \M-i L£ -l/ \M-i LE -2 

I ( V -1 



2 v - 2 \M-i LE -l )' 
Together with (1.13) this means: 

min{Y l Pv-20 1 (n a ),Y l Pv-20 2 (n a )} > r, 

completing the proof of the inductive step. □ 

Consulting Finitely Many Experts in Prefix Mistake Bound Games 

Let us denote by E~' (e) the set of states that can result from the adversary executing k 
"half" moves starting at state e. By Claim 1.4.6 \E?' k (e)\ = 1. Similarly, E a ' k (e) is the 
set of all possible outcome states for k steps of the alternating strategy. It follows from the 
proof of Theorem 1.4.4 that 



40 



Corollary 1.4.3 For a state vector e in a game with N experts, and for arbitrary k } M > 
and i > let e a G E a -l" fc / 2 l(e), e^ G E^ k (e) then 

e* ■ 1m > ^ =* e a • 1m > ^ 

Proof: By applying Theorem 1.4.4 to the iV experts' game with n = -j-j and the associated 
non-atomic game. □ 

This implies that 

Theorem 1.4.5 For two Prefix Mistake Bound games G\ } G 2 , identical except for the mea- 
sure function, such that G\ is a game with finitely many experts and G 2 is non-atomic 

Vv(G 1 )> \ l -V v {G 2 )-\. 

Proof: Let V = Vx>(G2(e)). By Theorem 1.4.2 we can assume w.l.o.g. that in the non- 
atomic game, G2, the adversary executes only stall moves and V "half" moves. Referring 
to the "half" moves in this sequence as first, second, etc., the adversary can replace the 
D-D "half" splits at odd positions by D-D alternating splits. By Corollary 1.4.3 this is a 
legitimate adversarial strategy achieving a payoff of |~|V|. □ 

1.5 Games' Values and Managerial Strategies 

In the previous section it is shown that in an expert consulting game the adversary may be 
restricted to a small set of strategies without changing his worst-case payoff. This section 
looks at the conclusions a manager can derive from the understanding of her adversary 
that we gained. 

1.5.1 The Values of Games 

Theorem 1.4.2 and Claim 1.4.6 give a simultaneous upper an lower bound on the game's 
value and allow an easy computation of the value of various families of games. We give a 
few examples: 
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Theorem 1.5.1 The value of a non-atomic MB game G, of length I is given by 

V MB (G,l) = argmax{P fc e° • l M(l) > n(l)}. (1.15) 



k<l 



Theorem 1.5.2 The value of a non-atomic MB game G, of bounded length < / that is 
determined by the adversary is given by 

V MB ' b \G, I) = arg max max {P k e° • l M{i) > f](i)}. (1.16) 

k {i:k<i<l} 

Notice that for a constant mistake bound M{1) = M,rj(l) = rj the adversary's payoff is 
upper bounded by an expression that remains bounded as / — > oo. Thus 

Corollary 1.5.1 A non-atomic MB game G, with a constant mistake bound, in which the 
adversary gets to determine the length of the game has a value of 

V MB ' cM (G) = arg maxjP, e° • 1 M > n}. (1.17) 



1.5.2 The Manager's Strategy 

Knowing the value of game states gives Alice the power to play the game opportunistically, 
as proven for the managerial strategy introduced in Definition 1.4.1. Having given explicit 
computable formulas of the values of games in the previous section, we can use these to 
present computationally efficient algorithms for expert managers. We term the algorithms 
presented in this section the PM algorithm, after the matrices they resort to. 

Non- Atomic Games 

Notation 1.5.1 Denote by i A MB ,A MB ' bl ,A MB ' cM the strategies defined by 

d° AXXX (o\o 2 ) = argmin{u t - = max{V xxx (o 8 ), V XXX (o 3 - t ) + 1}}. (1.18) 

«'=1,2 

The XXX superscript can be replaced appropriately. 
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Theorem 1.5.3 Strategies A MB , A MB,b \ A MB,cM are opportunistic in the respective non- 
atomic games. 

Proof: By Claim 1.4.2 and Theorems 1.5.1, 1.5.2 and Corollary 1.5.1. □ 

Games with Finitely Many Experts 

Theorem 1.5.4 Strategies A MB , A MB,b \ A MB,cM are (2,0) almost opportunistic in the re- 
spective games with finitely many experts. 

Proof: Expert consulting games are zero sum, thus Vv = Va = V. Therefore, the 
theorem follows from Theorems 1.4.3 and 1.4.4. □ 

Strategies A xxx reach their decision by comparing the values of the respective y xxx 
functions. They can be refined to compare in state e the values of J2 Pv xxx (e)-i ' ° X 
and J2 Pv xxx (e)-i ' ° 2 - These refined algorithms reach the same decision as the original 
algorithms whenever V (o 1 ) ^ V (° 2 )- However, they allow a better comparison 
of options which are equivalent in the associated non-atomic game. Yet even the refined 
algorithms do not always reach the optimal decision, as demonstrated by the following 
example. Thus they are not V-optimal for games with finitely many experts. The problem 
of finding such algorithms that are efficient remains open. Section 1.7.4 discusses the close 
relationship between expert consulting games and the better investigated faulty search 
games. In the latter opportunistic algorithms for finitely many candidate values are known 
only for the cases M = 1 (see Pelc [59]) and M = 2 (see Guzicki [30]). 

Example (Even the refined algorithms are not opportunistic): Consider a Constant 

Mistake Bound game with six experts, and n = |. 

V(i(2,3,0,0,l))=9, 

D 

V MB ' cM (i(2,3,0,0,l)) = 10. 
6 

Now if the adversary presents the split 

n = i(0,3,0,0,l), 
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then the value of the two options, based on the findings of a computer program, is 

V(i(2,0,3,0,0))=9, V(i(0,5,0,0,f))=8. 



Yet, as can be readily verified 

V MB ' cM {-{2, 0, 3, 0, 0)) = V MB ' cM {-{0, 5, 0, 0, f )) = 9. 
6 6 

If Alice uses the refined method to reach her decision, she finds that 

£^<o,M,o,i> = ^L y> 9 .i (2 ,o,3,o,o) = ^, 

leading her to the wrong decision. 

1.6 Efficiency Issues 

Next we discuss the efficient implementation of the PM algorithms and their asymptotic 
performance. We also refer to the lower bounds implied by their opportunism. 

1.6.1 From Strategies to Algorithms 

Alice can use the theorems in section f .5.2 as explained in section 1.5.1. When the adversary 
presents Alice with a set of options Alice needs to compute and compare the values of the 
appropriate y xxx function for these options to reach a decision. 

The value of the initial state vector e can be found by repeated doubling - comparing 
Pie to r\ for a sequence of matrices with / = 2°, 2 1 , 2 2 , . . ., and then once P\e becomes smaller 
than r\ for the first time performing a binary search for the value of / for which P\e < rj holds 
for the first time between the last two values of / for which P/e was evaluated. The search 
thus takes at most 21gV xxx (e ) steps. Each step requires computing M matrix values 
and multiplication by a vector. The factorial terms in each matrix entry require O(M) 
multiplications each, for a total of 0(M 2 ) computational steps per matrix on numbers of 
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size exponential in M (multiplication can be carried out in poly-logarithmic time). The 
bounds on the value of states in section 1.6.2 are polynomial in M and lg-. Thus the 
off-line precomputation can be carried out in polynomial time. 

After the off line pre-computation of V (e ) was carried out, the values of Pi-\ can 
be computed from those in Pi in 0(M) multiplications. At state e the value of least one of 
the two options presented to Alice is at least V xxx (e) — 1 by (1.7). Therefore, Alice needs 
to multiply the options by one matrix only. At most two matrix-vector multiplications 
are thus required to reach a decision. If the adversary plays suboptimally, additional 
computations might be needed to compute the pseudo value of the new state. 

The complexity of an on-line step can be reduced further by converting PM to an equiv- 
alent weighting scheme. Note that coordinate n 8 - of the split "votes" for both options. It 

( ' \ ■ ( ' \ 

votes for option one with weight n 8 - and for option two with weight n 8 -. 

\<i J \ < (i - 1) / 

Thus letting the experts of n 8 - vote with weight n 8 - for decision 1 and letting the ex- 

perts in e 8 - — n 8 - vote with weight (e 8 - — n 8 ) for decision 2, summing up these weights 

for all coordinates of n and voting with the heavy set leads Alice to the same decision. 

Corollary 1.6.1 The PM strategies are computationally efficient. 

1.6.2 Absolute Performance 

Opportunism reflects the performance of algorithms relative to other algorithms for the 
problem. Herein we obtain absolute loss bounds. Proving these algorithms are opportunis- 
tic means that these bounds are also lower bounds on the performance of any algorithm 
for the problem. They hold even for computationally unlimited algorithms and for the 
expected performance of randomized algorithms. 

Theorem 1.6.1 The total loss of algorithm A^% in a non-atomic MB game of length I 
with a uniform prior on experts is 

argmaxj >v0)}- l 1 - 19 ) 

k < 1 y M(i) J 

45 



Any other strategy's expected loss for this problem is at least as high on some inputs. 

Theorem 1.6.2 The total loss of algorithm A^% cM in a non-atomic MB game with 
Mil) = M and a uniform prior is 

argmax{/< lg- + lg I ] } = lg - + Mlglg - + 0(M lgM). (1.20) 

' rj \<M J V j] 

Any other strategy's expected loss for this problem is at least as high on some inputs. 
(The right side of (1.20) is due to Rivest et al. [61].) 

1.7 Extensions and Implications 

The results proven thus far for expert consulting games can be generalized in various 
ways. They have interesting parallels to those obtained for the problem of searching in 
the presence of errors. New proofs of old results as well as some novel results in that area 
follow from what we have shown. 

1.7.1 Consulting Finitely Many Experts: Comparing PM to BW 

and WM 

The PM algorithms are more general than the previously known WM and BW algorithms. 
In particular its performance guarantees hold for arbitrary initial expert state vectors. This 
can be used to incorporate a prior the manager has on his/her consultants. Such a prior 
can represent an initial estimate of the quality of the experts' predictions. It can also 
be based on other parameters like experts' salaries, or in the case of model selection the 
candidate models' complexity. Our algorithms further allow rj ^ -^, while previously known 
algorithms are restricted to the case of tracking the best expert. 

If we limit our discussions to games of tracking the best expert with a uniform prior, 
then PM is (2,0) almost opportunistic, while the previously known algorithms are not. 
Littlestone and Warmuth [44] suggest WM can be used in schemes that update the weights 
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of experts in every round, or only in rounds in which the manager errs for the same 
asymptotic performance. An important difference between BW and PM is that unlike PM, 
BW updates the weights only in rounds in which the algorithm errs. The superiority of 
PM provides evidence in favor of weight updates in every round. 

The upper bounds on the worst-case asymptotic performance of PM are the same as 
those of BW. Cesa-Bianchi et al. [18] argue they are superior to those of WM. The next 
example proves that the opportunistic ratio of BW is unbounded. 

Example (BW's opportunistic ratio is unbounded): Let us define a family {Gk} of 

expert consulting games. Game Gk is a game with n = 2 k + 1 experts in which the best 
expert is known not to err. 

Consider the following adversarial strategy: In the hrst k rounds the adversary splits 
the experts that did not err yet into two almost equal subsets. The difference between their 
sizes is exactly one. The experts that erred in previous rounds join the bigger set. Both 
BW and PM vote with the bigger set of experts, which the adversary announces correct 
in these rounds. Thus by the end of round k, the manager and a single "good" expert 
incurred no errors while all other experts erred once. 

The value of the subgame starting after round k is zero as the identity of the non-erring 
expert was already revealed. The PM algorithm "knows" which expert is "good", while 
BW ignores the information contained in the opening rounds. 

In the next k rounds the experts split into two subsets the size of which differs by one, 
the "good" expert voting with the minority. BW makes k mistakes in these rounds, while 
PM votes correctly. Thus while we have proven PM is (2, 0) almost opportunistic BW is 
not such, as the ratio between the number of mistakes made by BW and the number of 
mistakes made by an opportunistic algorithm can be arbitrarily large. 

1.7.2 Consulting Experts on Multiple Choice Questions 

So far we considered decision domains of size 2. That is the advice of the experts, as well 
as Alice's decision, answered a "yes/no" question. In this section we prove that the PM 
algorithms as defined in section 1.5.2 are opportunistic or almost opportunistic when the 
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advice of the experts as well as Alice's decision come from a finite set D of arbitrary size 
\D\ > 2. 

For domains of size > 2 a split of vector e would be a \D\ -tuple of vectors in 5J*, 
n = (n x ,n 2 ,...,n ), such that Y^ n% = e - The respective options are now defined to be 
o l = n l + (e - n l ) > f . 

The adversary can play strategy D\ by choosing splits n = (|, |, 0*, . . . , 0*) in the game 
with an arbitrary decision domain D. Thus the value of the game is lower bounded by 
its pseudo value V(G(e)) > V xxx (e). To prove the opposite inequality we notice that 
the proofs of the Claims of section f .4 generalize to games with bigger decision domains. 
Thus the properties of the pseudo value functions established in section 1.5.1 still hold, 
allowing Alice to use the PM algorithms in games with bigger decision domains for the 
same performance guarantees. 

1.7.3 "Real" Managers 

Alice may be allowed to make decisions in the domain [0,1]. If she decides d while the 
correct answer turns out to be c she is charged a loss of \d — c|, known as the absolute loss. 
If Alice's strategy is to make random binary predictions with probability d of predicting 
one, then the absolute loss measures Alice's expected loss in the game. 

Notice that if we replace Alice's [0, 1] decisions by binary decisions in the obvious way, 
making her vote one whenever d > |, then the loss she incurs is at most twice the loss she 
would incur if she were allowed to make decisions in [0, 1]. Thus the strategies described 
in section 1.5.2 are (2,0) almost opportunistic in a non-atomic game and (4,0) almost 
opportunistic in an atomic game. 

1.7.4 Searching in the Presence of Errors 

The problem of searching interactively under the assumption that some answers may be 
erroneous is well investigated. It was hrst introduced by Ulam [69] and addressed by 
numerous researchers [61, 60, 22, 30, 24, 6, 66, 67, 5]. Most of these papers model it by 
a multistage game. The searcher is commonly called Paul and his adversary Carole. The 
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problem most of these papers address is searching for a single value in a finite domain. In 
Rivest et al. [61] and Pelc [60] the authors consider finding the e vicinity of a single real 
value in a given segment. 

The various versions of games of searching by questioning a liar using arbitrary mem- 
bership queries are comparable to expert consulting games. The representation of the state 
in such games is the same as in expert-consulting games, but the protocol differs. In a 
search game the searcher presents a query to Carole in the form of a subset of the set of 
candidate values. Carole replies whether the target value is in this subset or its comple- 
ment. Thereby she decides the values in which one of the two subsets will accumulate one 
more vote against them being the target value. Some authors associate each candidate 
value with a chip, hence the name "chip games" [6]. The chips are positioned in a sequence 
of piles on a ray. The allowed positions for the various piles on this ray are indexed by the 
natural numbers. Paul is called a chooser, for his role of choosing the questioned subset, 
while Carole a pusher as she selects the subset of chips which will be pushed one position 
ahead on this ray representing the values associated with these chips having incurred an 
extra vote against them being the target value. The game proceeds in rounds. Various 
limitations are placed on the ways in which Carole is allowed to lie. She might be allowed 
a constant number of lies [61, 59]. Games in which Carole is allowed to lie \rl\ times in a 
game of / rounds are addressed by Aslam and Dhagat [6] and Spencer and Winkler [67]. 
This limitation might be enforced when the game terminates, or at each round. 

In the expert consulting game the adversary (Carole) chooses a subset of X. Alice 
decides which set she want to vote with, and then the adversary announces Alice right or 
wrong. Unlike Paul, Alice is restricted to choose the queried set from the two candidates 
presented by her adversary. On the other hand, while Paul is charged for each question he 
poses, Alice's "questions" can be interpreted as "guesses". Accordingly, she gets charged 
for bad guesses only. 

Suppose we restrict the adversary in a Mistake Bound to always disagree with Alice's 
decision, and give him the power to stop the game at any legal state he chooses. By 
Theorem 1.4.1 this does not change the game's value. Now the adversary becomes the 
chooser (Paul) in a chip game and Alice the pusher (Carole). However, the chooser's aim 
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in this game is to prolong the game, while the pusher tries for the opposite. This is a 
reversal of goals as compared to the goals of Paul and Carole. In a faulty search game the 
chooser is the one trying to shorten the game. 

Recall the class of game trees from Definition 1.3.4. The value of an expert game is 

max min depth(leaf), 
trees leaves 

while the value of a Paul-Carole game is 

min max depth(leaf). 
trees leaves 

Alice, thus has to make no more mistakes than Paul. For non-atomic games the values of 
the two games are equal, as "Paul" can ask about "half" splits. 

Our approach generalizes that common in papers addressing the faulty search problem 
in that it allows the searcher to express a non-uniform prior on the set of candidate values, 
by choosing arbitrary start vectors, not only (I). We further analyze larger classes of M{1) 
and rj{l) functions. We thus extend results of Rivest et al. [61] for what they call continuous 
games. 

Mistake bounded adversaries making linearly many mistakes, M{1) = \rl\ against mem- 
bership queries of Paul were explored by Spencer and Winkler [67]. We show 

Theorem 1.7.1 Paul needs at least , 4 _° , 2 questions to find the hidden number out of N 
candidates in a Mistake Bound (Version A of Spencer and Winkler [67]) game. 

Proof: A Faulty Search Game with finitely many candidate values is related to a non- 
atomic search game, in a relationship similar to that existing in Expert Consulting Games. 
The value of the associated non-atomic game is a lower bound (rather than an upper bound 
as in expert consulting games) on the value of a game with finitely many candidate values. 
The value of a non-atomic Faulty Search Game is in turn is lower bounded by the value of 
the associated Expert Consulting Game as explained above. Thus the number of questions 
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Paul needs to pose to find one out of N candidate values is at least 
argmax{P/(f) • 1|_ WJ > — } = argmax{2~' I 




1.211 



N 



ow 



2 = Pr{< \rl\ successes in / throws of an unbiased coin). 

V < N ) 

Using Chernoff's bound [38]: 

Pt[Si< (1-7H <e- lp ~ /2/2 . 
we let p = 1/2, 7 = 1 — 2r and get 

e _ /(l _ 2r) 2 /4 > g _ lnJV 



or 

41niV 



/< 



(l-2r) 2 

□ 

Spencer [66] implicitly proposes to evaluate states in the search game by equation (1.17) 
for T) = 1. His rationale for using these weights stems from considering a randomized 

strategy of Carole in which she uses a fair coin to decide her answers. The probability of a 

( j \ . 

chip to advance no more than s positions in j rounds is then 2 J . Equation (1.17) 

arises by asking whether the expected number of chips on the game board after a given 
number of steps is at least one. 

1.7.5 The Relative Game 

In this section we consider a game in which Alice's score is the number of mistakes she made 
in excess of her best advisors rather than the absolute number of mistakes she made. The 
states' representation remains unchanged but the semantics of states in the relative game 
is different. The j-th coordinate of the expert state vector now represents the measure of 
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the set of experts that made j mistakes less than Alice. Thus when split n l is presented 
at some game state e 8 the game can move into one of four states. If Alice's decision is one 
the game can move into one of the states: 



,«'+! 



J+l 



n l + (e 8 - n l ) > 1, 



+ (e 8 - n l ) < 1. 



Two analogous states are possible if she makes the other decision. 

If the adversary is required to satisfy the game termination condition, the game becomes 
a win/lose game in which the adversary either does or does not have a strategy that meets 
the requirements. 

A game termination condition may still be imposed as in absolute games. Alternatively, 
the adversary may not be required to satisfy a game termination condition in the relative 
game. The score in a relative game can then be defined 

argmaxje 8 • 1 M < ri(i)}. (1.22) 

M 

That is Alice is scored by the number of mistakes she made in excess of a prespecihed share 
T) of her advisors that make the smallest number of mistakes of all of her advisors in the 
game. Her aim is to minimize the score. 

The analysis of section 1.4 still holds, by similar argumentation. We can assume w.l.o.g. 
that the adversary always disagrees with Alice's decision, showing the problem of consulting 
experts in the relative game is reducible to the faulty search problem (a Paul-Carole search 
game). 

The equivalent of Theorem 1.4.2 likewise holds for relative games, stating that strategy 
D\ is opportunistic in non-atomic games. Thus the value of non-atomic relative games can 
be computed efficiently, as in Theorems 1.5.1, 1.5.2 and Corollary 1.5.1. E.g. 

Theorem 1.7.2 The value of a relative game of known length I with an unspecified termi- 
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nation condition is given by 



aremaxP/e • 1m < Tt(l). 

M J 



This in turn allows to specify computationally efficient opportunistic managerial strategies 
in the non-atomic game. 

If a winning strategy for the adversary (Paul) exists in a game with finitely many 
experts, such a strategy clearly exists in the associated non-atomic game, giving a necessary 
condition for the victory of the adversary. Corollary 1.4.3 establishes a sufficient condition 
giving us 

Theorem 1.7.3 In a Mistake Bound game with N experts if 

P,e -l M >v(l) (1-23) 



then Alice wins. If 



then the adversary wins. 



P2ie° ■ 1 M < ri(l) 



This establishes the same necessary condition for Paul's (the adversary's) victory as that 
in Spencer's [66] and a different sufficient condition. 
Notice that 

UH 

Thus for /, c sufficiently large depending on M if e° M _ 1 > cl M Condition 1.23 becomes 
sufficient as well as necessary, as the adversary can use the most erring experts counted by 
coordinate e M _ t (those of weight I in the vector Pie ) to choose splits for which the weight 
of both associated options is equal throughout the game. He can split alternatingly all but 
the most erring players, and use those to balance the weight of the two options. Thereby he 
will effectively play a strategy equivalent to D\ . This gives a different proof of the validity 
of the sufficient condition (the main result) specified by Spencer [66]. 

Based on computer experiments the author conjectures that when presented with two 
options o 1 , o 2 in a relative game with finitely many experts the manager (pusher) can make 
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an optimal decision by comparing the value of the same states in the associated non-atomic 
game, except for state vectors for which the least erring expert made significantly fewer 
mistakes than the other experts. The evaluation function for non-atomic games attaches 
such experts excessive weight. Instead the value of such states in a game with finitely 
many experts is equal to the value of an almost identical state in which the least erring 
expert is charged a single additional error. Proving this conjecture will reduce the problem 
of computing the exact value of states in the relative game with finitely many experts, 
addressed by Pelc [59] and Guzicki [30] for M = 1 and M = 2 respectively to the problem 
of calculating the number of moves in a game with prespecihed strategies for both sides. 

1.8 Conclusion 

Chip games were explored previously as a model of a faulty search procedure. We show 
they can be used to model expert-consulting situations as well. A chip game in which the 
goals of the pusher and chooser are exchanged is investigated as a model of another variant 
on the expert consulting problem. Our exploration of these games proceeds via an analysis 
of what we propose to call non-atomic chip games. Both games with finitely many experts 
which correspond to those chip games described in the existing literature, and non-atomic 
games are instances of a more general model of interest - games in which an arbitrary 
measure is defined over the set of chips. 

We derive the exact value of non-atomic chip games. For previously explored chip- 
games, corresponding to our games with finitely many experts, a similar result is known only 
for games with M = 1,2, while the general question remains unanswered. The specification 
of opportunistic strategies, or better yet algorithms, for both players in these games is a 
problem we leave open. Computationally it is no less interesting than the older problem 
of figuring out the value of such games, and can serve as a stepping stone towards that 
problem's resolution. 
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Chapter 2 

Analysis of Greedy Expert Hiring 
and an Application to Memory-Based 
Learning 



55 



2.1 Introduction and Definitions 

One of the great challenges in modeling human capacity is overcoming redundant informa- 
tion. Some researchers conjecture this is the main function of the early processing carried 
out by the brain (Marr [47]). Extracting that which is essential is likewise a difficult algo- 
rithmic problem arising in various circumstances - the clique problem, traveling salesman 
and many more. In the context of computational learning it was addressed explicitly by 
Blum [16], Blum et al. [15], Littlestone [43], Ben-David and Dichterman [10, 11], Birkendorf 
et al. [14]. 

Consider a manager faced with the task of hiring experts from a pool of N candidates. 
We assume that he can find out the utility of hiring particular sets of experts by querying 
an oracle x : 2 N —■ 3J. Finding an optimal solution of size k would force him to look at 

possible sets of experts. This number is exponential in k. Therefore, looking for an 
optimal solution in the general case is infeasible for large k. 

The manager may choose to substitute global considerations for local ones by hiring, for 
example, one expert at a time greedily. That is, hiring at step j the expert that contributes 
most to the set of j — 1 experts that were already hired. This would reduce the complexity 
(number of calls to the oracle) to a reasonable Q(kn). 

The optimism of the greedy heuristic can be contrasted with the pessimism underlying 
worst-case analysis. The author believes that most conscious data processing is carried 
out by simple heuristics. Thus a detailed understanding of the conditions governing the 
performance of such heuristics will contribute no less to our understanding of conscious 
intelligence than for example the exploration of complex optimization algorithms. 

For an unrestricted input no performance guarantees can be provided for this heuristic. 
However, for functions x that are monotone and concave a uniform lower bound on the 
performance of greedy hiring holds. Nemhauser et al. [52] present a bound on the ratio 
between the value of the greedy approximation and that of the optimal solution. 

When the value of sets of experts has to be estimated by sampling, the manager may 
only have access to approximate values of such sets, rather than exact values. We show 
a uniform lower bound on the quality of a greedy approximation for the same family of 
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games is still valid. 

This can model, for example, a learning situation in which the learner has access to 
some "learning engine" as a subroutine and to a source of labeled examples. The learner 
(manager) can draw a labeled sample, and then run the learning subroutine on a filtering 
of that sample that passes to the learning subroutine only the values of the coordinates in 
some chosen subset of coordinates. He can find out or estimate the "value" of the various 
subsets of coordinates by testing the hypothesis this subroutine produces. His goal is to 
choose a good subset of the coordinates (select features) while minimizing the number of 
invocations of the learning subroutine. 

The analysis of the performance of greedy hiring in coalitional games implies a lower 
bound on approximations of the s-median problem defined below. Approximation algo- 
rithms for the s-median problem are in turn, is a useful tool in the development of a 
learning algorithm for Lipschitz functions (Lin and Vitter [42]). 

A memory-based learning system is a system that approximates (learns) a given function, 
/ : X — > Z } in the following manner: An instance of the input space X is mapped by an 
encoder 7 to the addresses of one or more memory locations. The contents of these locations 
are combined by a decoder f3 to produce an output in Z . Learning can be done in batch 
mode or on-line. The encoder or the decoder or both of them can be learned. A memory- 
based learning system can be evaluated by its sample, time, and space complexities. 

Conceivably, such systems can be used to learn functions over both discrete and contin- 
uous domains. Lin and Vitter [42] give a historical overview of early research on memory- 
based learning systems. Their stated main result is a memory-based learning system that 
PAC-learns in polynomial time and space, to which we propose an alternative. 

A Voronoi system is a very simple memory-based learning system. The system can be 
specified by s pairs {(xi,Zi)}j =1 , where X{ £ X,Zi £ Z. The encoder maps a point x to the 
index of its nearest neighbour in {^ 8 } 8 S =1 , say i if the nearest point is Xi . The decoder 
outputs Zi . The x 8 -s do not have to be stored explicitly. We call s the size of the system. 

A function / satisfies the Lipschitz condition if there exists a constant K such that 

(Vx,x' £ X) . d,Y(f(x),f(x')) < Kdx(x,x'), 
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Figure 2-1: A memory-based learning system. 

where dx 7 dy are metrics defined over spaces X and Y respectively. A function satisfying 
this condition is called a Lipschitz function with bound K. A class of functions T is 
uniformly Lipschitz bounded if there exists a bound K such that all functions in T are 
A^-Lipschitz functions. Call such an T a class of Lipschitz functions. 

Now let us define the s-median problem (based on Lin and Vitter [41]). In section 2.4 
we discuss its relevance to efficient memory-based learning of Lipschitz functions. The 
input is a complete (directed or undirected) graph G = (V, E) on n vertices. Non-negative 
weights Cij are associated with the edges. We call the c 8J -s distances. The goal is to choose 
a subset U of size s of the vertices that minimizes the sum of distances from each vertex 
to its nearest neighbour in U . Call U the median set. 

The s-median problem arises in data compression, network location, and clustering. It 
is A/"'P-hard even in the Euclidian space [48, 57]. Lund and Yanakakis's lower bounds for 
the set-covering problem imply that it is A/"'P-hard to find e-approximate solutions of size 
o(s log |V|) to the s-median problem for an e sufficiently small [41, 46]. 

We generalize the analysis of Cornuejols et al. [21] to account for approximate solutions 
of the s-median problem that are not necessarily of size s. We then evaluate the usage of a 



58 



greedy approximation scheme as an alternative to the (also greedy) approximation scheme 
used by Lin and Vitter [42] in their algorithm for memory-based learning of Lipschitz 
functions. It is found to be easier to implement and to have better time complexity than 
the scheme proposed by Lin and Vitter. In many cases it also yields a smaller Voronoi 
system. 

Section 2.2 describes known and novel results stating conditions for successful greedy 
expert hiring. Section 2.3 reviews the Lin- Vitter approximation algorithm for the s-median 
problem, and presents and analyzes a simpler greedy alternative. Section 2.4 describes how 
these algorithms can be used as part of a memory-based learning system, and compares 
them in that context. 

2.2 Greedy Expert Hiring 

This section establishes conditions that guarantee lower bounds on the performance of the 
greedy heuristic when it is applied to expert hiring problems. It uses mathematical tools 
from the theory of coalitional games which it hrst recounts. Subsection 2.2.2 describes 
results proven by operations researchers that are relevant to hiring when the exact values 
of the various sets of experts are known. The following subsection extends these bounds to 
the situation when these values are known only approximately. 

2.2.1 Coalitional Games, Concave Coalitional Games 

We begin by reviewing a few definitions and facts about coalitional games. A coalitional 
game is a function: 

v : 2 N -* 3J 

satisfying u(0) = 0. The set N = {1, . . . , n} is commonly called the set of players, and the 
set of its subsets, 2^, the set of coalitions. (Here we assume N is finite.) An introductory 
text on coalitional game theory is by Owen [56]. 
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A game is monotone if VS 1 , TCJV such that 5CT: 

*;(£) <u(T). 
A game is additive if VS 1 , T C A, ■S'flT = : 

«(5) + t)(T) = i;(5ur). 

A game is subadditive if VS 1 , T C A, ■S'flT = : 

*;(£) + u(T) >«(5ur). 

A game is concave if it satishes the condition of diminishing returns for all z £ A and 
for all £, T such that £ C T C A \ {z}: 

u^ U {i}) - v(S) > v(T U {i}) - v(T). 

These naming conventions are due to Shapley [63] who defined and investigated convex 
games. After Shapley we justify the name "concave games" by defining a differencing 
operator Ar for all R, S C A: 

[A R v](S) = v(SuR)-v(S\R). 

If we let Aqrv denote Aq(Arv) then the dehnition of concavity given above is equivalent 
to the assertion that these "second differences" are everywhere negative, i.e. VQ, R, S C A 

[A QR v](S)<0 

The operator Aqr is analogous to the second derivative associated with concave functions 
in real analysis. 
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An equivalent definition of concave games is via the condition 

VS, T C A . v(S) + v(T) > v(S U T) + v(S n T). 

It follows that all concave games are subadditive. 

Two games, v,u dehned on the same set of players A are termed equivalent if their 
difference is an additive game, that is, if there exist constants v^, . . . , v n such that \/S C A 

v(s) - u(s) = J2 v >- 

ies 

Any game equivalent to a concave game is concave. In particular a scalar multiple of a 
concave game is concave. Hence the set of concave games for a fixed A forms a convex 
cone in the linear space 3J 2 _ W. This cone contains the subspace of additive games. 

2.2.2 Hiring Experts Using Exact Values of Coalitions 

Let A denote the set of experts. For a subset S of the experts, where S C A, let x(S') 
represent the value of this coalition for the manager. In the following we call x the coalitional 
expert game. 

Assume the manager has access to an oracle x that he can query for the value x(S') of 
an arbitrary coalition S. If the manager is interested in hiring a set of k consultants it is 
natural for him to try a greedy approach. This means that he repeatedly hires the locally 
optimal expert. Starting with an empty set of hired experts, the first expert to be hired is 
the expert e\ maximizing x({ei}). After j experts ef, . . . , e 9 - r have been selected the next 
expert, e^+ii ^° ^ e hired is the one satisfying 

ef +1 e argmax{x(ef,...,ef,e) : e G A \ {ef , . . . , ef }}. 

We would like to derive a bound on the performance of this greedy heuristic that 
is uniformly valid for a family of games. That is find a bound valid for all games in 
the family. Let {ej p , . . . , e° k p } be an optimal set of k experts, and let X k p be its value: 
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X^ = x(ej p , . . . , e° k p ). Let {ef r , . . . , e| r } be a set of j greedily hired experts of value 
Xf = x(dj"\ . . . , e| r ). We would like the ratio 

p,k=xr/xr 

to be lower bounded by a function of j and k that does not depend on x. 

This problem was considered by Nemhauser and Wolsey [51, 50] and by Nemhauser, 
Wolsey and Fisher [52]. They prove that for games that are monotone and concave 






Pk.k > 1 - (1 - ^) k > 1 - -. 



k' 

A similar bound can be proven on P h k for arbitrary j . They also analyze a somewhat more 
sophisticated and general approximation scheme for which they prove a matching inverse 
bound: 

Theorem 2.2.1 For a concave and monotone coalitional game, and for each integer q > 
there is an algorithm that uses 0(n q+1 ) queries of oracle x and finds a coalition E q of 
arbitrary size k for which 

n " xT ~ { k )[ k-\ ] ■ 

For any integer q > ; Pi is the best ratio achievable by an algorithm that uses 0(n q+1 ) 
queries to find a coalition of size k. 

2.2.3 Hiring Experts Using Approximate Values of Coalitions 

In practice the assumption that the manager has access to an oracle providing him with 
the precise values of expert coalitions might be unrealistic. In some situations the values 
of coalitions may have to be estimated by a stochastic process, e.g. by experiments the 
manager performs with those coalitions. The manager then might have to make his deci- 
sions based on approximate values of coalitions, rather than on exact values. This section 
establishes a uniform lower bound on the performance of a greedy manager in these cir- 
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cumstances. We model the situation by letting the manager query an oracle that gives 
approximations of the coalition's values. 

An a-approximate oracle for game x, denoted x~", is an oracle that when queried with 
a coalition S of players returns arbitrary values satisfying 

ax(S) < x~ a (S) < a- 1 x(5') (2.1) 

for < a < 1. 
We prove that 

Theorem 2.2.2 For a coalition of size j , E 9r '~ a , that was hired greedily with respect to an 
a-approximate coalitional game oracle x~" of a concave and monotone game x 

x(Ef~ a ) > max {(1 - (1 - ^Xf} 

k=l,...,n K 

> max{(l-e-^)If'}. 

k=l,...,n 

The proof of this theorem is presented at the end of the section. 

Definition 2.2.1 For < a < 1 call a coalition {ef' a , . . . , e 9r ' a } a-greedily hirable if it 
can be ordered ef' a } . . . , e 9r ' a so that for all I, > / < j : 

x(e^ , . . . , ef , ef^ ) > «maxx(e] , . . . , ef , e). (2.2) 

Claim 2.2.1 A coalition {ef' a , . . . , e 9r ' a } is a-greedily hirable with respect to game oracle 
x if and only if it greedily hirable with respect to a \fa- approximate oracle for the game x. 



Proof: ==>■ Let ef' a , . . . , e 9r ' a be the ordering with respect to which coalition 



rpgr,a _ ( gr,a „a r , a \ 

rj; — \e x , . . . , e • } 



is a-greedily hirable. Define a -^/a-approximate oracle y for the game x. It returns a 2 x(S') 
for the j coalitions of the form S m = {ef'",. . . , e^'"} where m < j. It returns a^x.(S) 
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for all other coalitions. Then from equation (2.2) it follows that coalition E^ r ' a is greedily 
hirable with respect to oracle y. 

<^= If for two coalitions we have x~' 3 (S') < x~^(T) then 

Px(S) < x^(S) < x~^(T) < /3 _1 x(r) 

implies 

x(T) > p 2 x(S). 

i 
Thus greedy hiring with respect to x~" 2 yields a-greedy hiring with respect to x. □ 

Denote by Sf' a the set of all a-greedily hirable coalitions and let 



Xf' a = min x(£), 

3 



pa __ vgr,a i jropt 
j,k j Ik' 

Theorem 2.2.3 For expert games which are concave and monotone and any < a < 1 

and integers j, k 

P" ik >l-(l- jY > 1 - e~ ai/k . 

Neither concavity alone nor monotonicity alone guarantee a non-trivial bound on P" k that 
is uniformly valid for all games. 

Proof: Let E^ r ' a = {ef' a , . . . , e| r '"} denote one of the worst a-greedily hirable coalition 
of j experts. Assume w.l.o.g. ef' a , . . . , e 9 p a are ordered to satisfy (2.2). For / = 0, . . . , j ' — 1 

opt i gr,a 9 r > a \ 



xt - x( e ? 



■>■ ■ ■ -> e i 



= x({ e rv.., e f})-x({ e rv.., e r}) 

/ , \ ^ / r qr.a qr.a opt opt-} \ 

[x is monotonej < x({e^ , . . . , e-? , e-f , . . . , e k \) 

-x({ e fv.., e r}) 

— \v(lp gr ' a p gr ' a p opt p opt p opt \\ 

— L x U e l i • • • i c l i e l i- • • i^k-\i^k )) 

—v(lp gr ' a p gr ' a p opt p opt XW 

x U e l i • • • i c l i e l i • • • i c k-lS)\ 
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-L\v(lp gr ' a P gr ' a P 0pt P 0pt P 0pt \\ 

^L x U e l i • • • i c l i e l i • • • i c k-2i c k-lS) 

— v(lp g ' a P gr ' a P 0pt P 0pt XW 

+ [x({er a , . . . , ef" , ef}) - x({er\ . . . , ef" })] 

( \ ^ r ( ( qr.a qr.a or>t~\ \ / r qr, a gr.an \i 

(x is concave) < [x({e^ , . . . , ef , e/ }) - x({e^ , . . . , ef })J 

+ [x({ef-, . . . , ef-, ef J) - x({ e f«, . . . , ef-})] 

+ [x({ e f' a , . . . , ef" , 4 Pt }) - x({ef' a , • • • , ef" })] 

< -[x({ef", . . . , ef'", ef x "}) - x({ef", . . . , ef'"})] 
a 



We get that 



or 



x(ef", . . . , ef x ") > (1 - f )x( e f' a , • • • , ef'") + |^r , 



/ qr.a qr,a\ / qr.a qr,a\ 

x(e^ , . . . , ef + ' x ) _ a x(ej! , ■ ■ ■ , ef ' ) a 

Xf f ~ l A; ' X° k pt k ' 



By induction on / it follows that: 

Pfk > l - (i - tY > l ~ e ~ a3/k - 



x(lv ' 



I L = P a > 1 _ (-[ - — N 

xf f J,fc - v k- 



We now show that even for a = 1 neither condition can be dropped. 

Concavity is necessary : For an arbitrary natural number M consider the following game 

x that is monotone but not concave: 

x o (0) = 0, 

x ({1}) = 1, x o ({2}) = 0, x o ({3}) = 0, 
x ({1,2}) = 1, x ({2,3}) = M, x ({1,3}) = 1, 
Xo({l,2,3}) = M. 

Then Xf = 1, X% p = M, giving the ratio P 2 ,2 = Pi 2 = J4 that is not bounded from 
below. 
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Monotonicity is necessary : For an arbitrary natural M consider game Xi that is concave 
but not monotone: 

X!(0) = O, 

Xl ({l}) = M + l, Xl ({2}) = M, Xl ({3}) = M, 
X1 ({1,2}) = 1, Xl ({2,3}) = 2M-l, X1 ({1,3}) = 1, 
Xl ({l,2,3}) = M-l. 

Then X^ r = 1, X% p = 2M — 1. Hence, P 2 ,2 = P22 = 2M-1 can ^ e arbitrarily small. □ 

The bound on P" k holds simultaneously for optimal solutions of all sizes. This allows 
us to state a stronger lower bound on the performance of a-greedy hiring. 

Corollary 2.2.1 For an a-greedily hirable coalition Ef' a of size j in a concave and mono- 
tone game x; 

x(Ef a ) > fc maxJ(l - (1 - ^)Xf} 
> max{(l- e -^) X r}. 

We conclude this section by proving Theorem 2.2.2. 

Proof (of Theorem 2.2.2): The theorem follows from Claim 2.2.1 and Corollary 2.2.1. 

□ 



2.3 Two Approximation Algorithms for the s-Median 
Problem 

Lin and Vitter [42] present an algorithm that finds an approximate solution to the s-median 
problem by solving a linear programming problem and then applying the greedy heuristic 
to the solution. Using the results in the previous section we analyze the performance of the 
greedy heuristic when applied to the same problem directly and then compare the bounds 
these two approaches yield. 
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2.3.1 The Lin-Vitter Algorithm, Review 

For a point x in a metric space X and a set S C X it is common to define the distance of 
x to S as 

d x (x, S) = mi[d x (x, y) : y G S]. 

The s-median problem is the problem given a hnite set of points £, of hnding a subset of £ 
of size s called the median set for which the average distance of points in £ to the median 
set is minimum. 

The s-median problem for a set £ of m points £ = {^i, . . . , x m } can be formulated as a 
— 1 integer program of minimizing 

-i m -i m m 

d({U) = —J2d x (xi,U) = —y2y2p lJ d x {x l ,x J ) 
m -. m -. -. 

Z = l Z = l J = l 

subject to 

EjLiPtj = l, i = l,...,m; 

P tJ < qj, i,j = l,...,m; 

Pij,?j G {0,1}, z,j = l,...,m; 

where ^ = 1 iff Xj is chosen as a cluster center, and pij = 1 iff ^ = 1 and X{ is "assigned" 
to Xj's cluster. 

The linear program relaxation of the above is allowing ^ and p^ to take arbitrary values 
in the interval [0, 1]. The value of an optimal fractional solution (linear program solution) 
is a lower bound on the value of solutions of the discrete s-median problem. 

The Lin-Vitter algorithm works as follows: 

1. Solve the linear program relaxation of the discrete s-median problem by linear pro- 
gramming techniques; denote the fractional solution by q,p. 

2. For each i = 1, . . . , m compute Di = Y1T=\ d X (xi } Xj)p. 



-j )fi-j- 
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3. Given a relative error bound e > 0, for each j such that q 3 > 0, construct a set Sf. A 
point Xi is in Sj iff dx(xi,Xj) < (f + e)D{. (Note that Xj G Sj for all S r ) 

4. Apply the greedy set cover algorithm [19, 36] to the covering of £ by the sets {Sj}, 
choosing iteratively the set Sj that covers the most uncovered points. Repeat this 
process until all points of £ are covered. Let Ijj be the set of indices of sets chosen by 
the greedy set-covering heuristic. Output U = {xi}i e i v as the median set. 

The linear programming problem can be solved in provably polynomial time by the 
ellipsoid algorithm [39] or by the interior point method [37]. The simplex method [23] works 
very efficiently in practice, although in the worst case its performance is not polynomial- 
time. 

Lin and Vitter [41] show: 

Theorem 2.3.1 Given any e > ; the Lin-Vitter algorithm outputs a set U of size at most 

s(l + l/e)(lnm + 1) 

such that 

dt(U) < (1 + e)D, 

where D is the average distance of the optimal fractional solution for the discrete s-median 
problem. 

2.3.2 A Simple and Efficient Greedy Algorithm 

Cornuejols et al. [21] were the hrst to derive a bound on the performance of the greedy 
heuristic for the s-median problem. The bound they showed is somewhat stronger than the 
bound in Theorem 2.3.2, as explained towards the end of this subsection. Subsequently, 
Nemhauser et al. [52] generalized the result to arbitrary coalitional games. We show a 
derivation of the bound for the s-median problem from the general bound for Coalitional 
Games. We also generalize their analysis to allow approximation of the s-median set by 
sets of size other than s. 
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Description of the Algorithm. 

Given a set £ = {x 8 }™ 1 of m points in X and a nonnegative integer t, where t < m, the 
algorithm selects a subset of size t of £. The algorithm works as follows: 

1. Set S = 

2. Choose x G arg min^^s ^(S* U {^}) and set S = S U {x}. 

3. Repeat step 2 t times. 

The time complexity of this algorithm is 0(tm 2 ). 

The Derived Expert-Game for This Problem is Concave. 

Define a value function for the expert game by defining x as follows : 

vs c £ . x (5') = -Ms) + C, 

where C is a normalizing constant that guarantees concavity. We assign an artificial value 
of C to ^(0), thus x(0) = 0. The proper selection of C is discussed in section 2.3.2. 

We can show the concavity of this game via the condition of diminishing returns. Indeed 
for an R C £ let Nr(x) denote the neighbours in £ of a point x, x G R. This is the set of 
points of £ that are closer to x than to any other point of R. 

N R (x) = {y G £ : x G arg mm d x (y, z)}. 

zdR 

When adding x to coalition S the distance dx(y } S) may be different from the distance 
dx(y } S U {x}) only for points y G -/V^su^^x). The set of such points subsides as the base 
coalition grows 

S CT ^ x => N Su { x }(x) D N Tu{x} (x). 

Hence, the set of points, j/, for which dx(y } U) is reduced by adding x to U for U = S is 
a superset of such points for U = T . As the candidate median set U grows the distance 
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dx(y } U) decreases weakly for all points y £ £. That is 

SCT => d x (y,S)>d x (y,T). 

It follows that as the candidate median set grows the gain achieved by adding one more 
point to the set diminishes: 



x(5' U {x}) - x(S) > x(T U {ic}) - x(T). 



Choosing C. 



The previous section proves that the expert game is concave for coalitions of size greater 
than or equal to 1. Note that the additive constant C cancels out in the condition of 
diminishing returns, and hence its value is not important. To complete the proof, we have 
to define C = o^(0) in a way that will not violate concavity. Choosing a big C would do, 
but this would weaken the bound we get in section 2.3.2. The diameter of a set of points 
is defined as: 

diamS' = sup{dx(x } y) : x,y £ S}. 

Let C = 2 • diam^, as 

max{^({i}) : x £ £} < diam^ 

and 

max{d^(S) — d^(S U {x}) : S C £, x £ £} < diam^ 

guarantee together that \/x £ £, \/S C £ \ {x}: 

^(S) - d ( (S U {x}) < diarn^ < 4(0) - ^({a;}). 

Estimating the Quality of the Approximation. 

Each s-median problem is equivalent to a concave and monotone expert game. This can be 
used to bound the quality of approximation the greedy algorithm yields for this problem. 
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Notation 2.3.1 Let Dj = d^(S^ r ) denote the average distance of the approximation of size 
j produced by the greedy algorithm; let D = d^(S^ pt ) denote the average distance of the 
optimal solution of size s; and let Pj jS = D 3 /D be the ratio between them. 



Theorem 2.2.3 gives us 



Oi 



■ D < + c >i- ,-»: 



-D + C 



Dj < D[l + e-^ - 1)]. 

To allow a comparison to Theorem 2.3.1 we let e = e~ J ' s (C / D — 1). Solving this for j 
we get 

Theorem 2.3.2 Given any e > 0, the greedy algorithm outputs a set U of size 

1 2diam<f 2diam<f 

s In - + In p- 1 < s In ,r— ) 

e D eD 

such that 

dt(U) < (1 + e)D, 

where D is the average distance of the optimal solution for the discrete s-median problem. 

Linear Programming vs. Greedy. 

To compare Lin and Vitter's s-median approximation algorithm to the greedy algorithm 

described in this section note that their theorem gives a uniform bound for all input graphs 

satisfying |V| < ra, while our bound is uniform for graphs with identical -r ratio. 

cliam £ 

The greedy algorithm's performance grows logarithmically rather than linearly with -. 
Easy implementation is another potential advantage of a vanilla greedy approach. Lin 
and Vitter express the quality of approximation in terms of the optimal fractional solution, 
while Theorem 2.3.2 expresses the quality of approximation in terms of the optimal integral 
solution. Cornuejols et al. [21] and Nemhauser et al. [52] show the bounds of Theorem 2.2.3 
hold relative to the optimal fractional solution of the linear programming formulation of the 
s-median problem for j = k and a = 1. Similarly, D can be replaced by D in Theorem 2.3.2. 

71 



Contributions of This Work to the Analysis of the Greedy Heuristic's Perfor- 
mance. 

Previous results on the performance of the greedy heuristic of Cornuejols et al. [21] and 
Nemhauser et al. [52] do not allow its comparison to the Lin-Vitter approximation algo- 
rithm, as they do not consider a relaxation of the requirement on the desired size of the 
approximating set. Their analysis bounds only the ratio we denoted P^^ and not the more 
general P h k- 

Yet another novelty of our work is the proof presented in this section. Historically, 
Cornuejols et al. derived a lower bound on the quality of a greedy approximation to the 
solution of an s-median problem, which was subsequently generalized to arbitrary concave 
and monotone games by Nemhauser et al. Our proof, by contrast, proceeds from the general 
to the specific. 

2.4 Application to Memory-Based Learning 

Having analyzed the performance of a greedy alternative to the approximation algorithm 
Lin and Vitter present for the s-median problem, this section compares the performance 
of the two approximation algorithms as tools in the construction of Voronoi Systems that 
model Lipschitz functions. It begins with a review of the learning algorithm. Then it 
compares the size of the Voronoi system required by the original algorithm of Lin and 
Vitter to that required by the greedy alternative analyzed in the previous section for the 
same user-specification of accuracy and confidence. It concludes with a review of the proof 
that the Lin Vitter algorithm indeed works (with either approximation subroutine). 

2.4.1 The Learning Algorithm 

Lin and Vitter [42] propose to learn classes of uniformly Lipschitz bounded functions by 
Voronoi systems of polynomial size with respect to the the error measure 

erp x {f,g) = Ex[d Y (g(x),f(x))] 
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X 



dY(g(x),f(x))dP: 



x- 



Let Qp x (X } e } dx) denote the quantization number defined to be the smallest integer s 
such that there exists a Voronoi encoder 7 of size s that satisfies ~E[dx(x,u 1 ( x ))] < e. The 
algorithm draws 

sdirnVdiamV diamy diamy 1 

m = J2( log slog 1 l°g t) (2-3) 

e e e 8 

examples, where s = Qp x (X } j^ } dx)- It runs an s-median approximation algorithm on the 
sample that was drawn. The resulting median set is used to build a Voronoi system, which 
is output by the algorithm. 

Section 2.4.3 reviews the proof that for any given e, 8 and any target function / in the 
class the algorithm outputs a Voronoi system which implements a function h for which 
with confidence of at least 1—8 

erp x (f,h) < e. 

2.4.2 Comparing the Two s-Median Approximation Subroutines 

The size of a Voronoi system produced by the Lin-Vitter approximation algorithm is 

^ , s K ■ diam Y log m , 

0( £_). (2.4) 

If a priori information on the distribution of the input points is available, a lower bound 
d < D on D may hold almost everywhere, that is for all of the space except, possibly, for a 
set of measure zero. For example, for m input points drawn from the uniform distribution 
on a region of area A in the plane with probability one the value of a solution to the 5- 
median problem is lower bounded by /3(m — s)J—, for some constant /3, as shown by Fisher 
and Hochbaum [26]. Then the vanilla greedy algorithm may be used to produce a system 
ofsize0(s-log A '- di ^ my ). 

Since a confidence parameter is inherent in the evaluation of the performance of PAC 
learning systems, the following simpler analysis suffices for a better comparison of the two 
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approximation schemes in the context of learning. The size of D is lower bounded by 
the distance between the two nearest points in £. For an ordered set of m points drawn 
independently with respect to Px let N% denote the distance between the hrst point and 
its nearest neighbour in the set 

N^ = rsim{dx(x\i Xi) : i = 2, 3, . . . , ra}. 

Consider, for example the case of a region X C $i n where dx is defined to be an lp 
norm \x\ = (J2\ zf) 1 , and Px a bounded density such that Px < P ■ Then 

Pr{7V£ < e} < Pm(2e) n . 



Now 



Pr{D < e} < mPi{D < e A X\ G arg min dx(xi,Xj)} 

< mPr{7V£<e} 

< Pm 2 (2e) n . 



Thus, for a given 8 > 0, with probability at least 1 — 6, 



i» 1 -i s - 



2 V P 



m A 



Since any two norms |.|i, |.| 2 on ?R. n are equivalent, that is a\x\\ < \x\ 2 < b\x\\ for some 
positive constants a, b [45], for any norm on ?R. n : 



D>C{ 



8 ,i 



P 



m 



2- 



for some C > 0. For distribution-metric pairs, (Px 7 dx) } for which the bound 

m K 
holds with conhdence 1 — 6, that is for all but a share 8 of (P|p,cf^), a greedily chosen 
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memory-based learning system of size 

^ / /, K ■ diam Y 1 , m k . . . 

e(s ■ (log + — — log — (2.6) 

e dim A 6 

can meet prespecified accuracy and confidence bounds given by parameters of e and 1—6. 
To achieve this we choose m, as specified in (2.3), for a confidence parameter of f — 6/2. 
We also choose the size of the greedily selected approximating set specified in (2.6) with 
respect to confidence f — 6/2. The asymptotic size of the memory-based learning system is 
then given by (2.6). This is a smaller system than that produced by the algorithm proposed 
by Lin and Vitter, the size of which is given by (2.4). 

2.4.3 How to Prove That it Works 

This section gives an outline of Lin and Vitter's correctness proof for the learning algorithm 
described in Section 2.4. f. 

First we quote two definitions after Haussler [34, 35]. 

For r £ 5J let sign (r) = f iff r > 0, and zero otherwise. 

Definition 2.4.1 For A C 3£ m say A is full if there exists an x £ !"R m such that the set of 
sign vectors of the following sums is of the maximum size possible 

|{<sign(a,- + y0)£i : y G A}\ = 2 m . 

Definition 2.4.2 Let J 7 be a class of functions from a set X into 9£. For any sequence 
£ x = (xi,..., x m ) of points in X, let F(£x) = {(/(^i), • • • , f(x m )) : / G T}. If F(£x) is 
full we say that £x is shattered by T . The pseudo-dimension of T denoted by dinipjF ; is 
the largest m such that there exists a sequence of m points in X that is shattered by T . If 
arbitrarily long sequences are shattered, then dinipjF is infinite. 

If T is a class of {0, I }-valued functions then the definition of the pseudo-dimension is 
the same as that of the VC dimension. Haussler and Long [33] showed an upper bound on 
the sample complexity required to guarantee the uniform convergence with confidence 1 — 6 
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of the empirical estimates of a given family of functions with a bounded pseudo-dimension. 
Lin and Vitter show that the pseudo-dimension of Voronoi encoders of size at most s is 
0(dimX ■ slogs). Note that an e/A^-good Voronoi encoder guarantees an e-good Voronoi 
system, by the Lipschitz condition. 

Choosing s = Qp x (X } -^ } dx) they assure that there exists an ^-good Voronoi encoder 
of size s. Then by drawing a sample of the size required by Haussler and Long they 
guarantee that with high confidence the empirically-best Voronoi encoder of size s is ^ 
accurate. Hence a solution to the s-median problem would produce an |-good Voronoi 
system. Since a solution is generally ft/V-h&id to find output an approximation that yields 
an e-good system. 

2.5 Conclusion 

One of the fundamental problems of AI is filtering out redundant information. Operations 
researchers have investigated this problem as modeled by a Coalitional Game. In this model 
a sufficient condition was found for the existence of a uniform bound on the performance 
of the greedy approximation heuristic. The same condition on the game, monotonicity 
and concavity, implies a uniform bound even when approximate rather than precise values 
of coalitions are known. An s-median problem can be mapped to a game satisfying the 
condition. We use this to derive bounds on the quality of a greedy approximate solution 
to the s-median problem. We argue that in the context of memory-based learning of 
Lipschitz functions the greedy approximation algorithm is an attractive alternative to the 
approximation technique proposed by Lin and Vitter [42]. 

Further exploration of the greedy heuristic as well as other simple data processing tech- 
niques may contribute, we conjecture, to a better understanding of conscious intelligence. 
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Chapter 3 



Scapegoat Trees 
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3.1 Introduction 

There are a vast number of schemes available for implementing a "dictionary" - support- 
ing the operations INSERT, DELETE, and SEARCH - using balanced binary search trees. 
Mehlhorn and Tsakalidis [49] survey the recent literature on such data structures. In this 
paper we propose a new method that achieves optimal amortized costs for update opera- 
tions (INSERT and DELETE) and optimal worst-case cost for SEARCH, without requiring 
the extra information (e.g. colors or weights) normally required by many balanced-tree 
schemes. This is the hrst method ever proposed that achieves a worst-case search time 
of O(logra) without using such extra information, while maintaining optimal amortized 
update costs. In addition, the method is quite simple and practical. 

In their comparative study Baer and Schwab [7], distinguish height-balanced schemes 
from weight-balanced schemes based on the criterion that triggers restructuring. 

In a height-balanced structure the extra information stored at each node helps to enforce 
a bound on the overall height of the tree by bounding the height of subtrees. Red-black 
trees , were invented by Bayer [9] and refined by Guibas and Sedgewick [29]. They are 
an elegant example of the height-balanced approach. Red-black trees implement the basic 
dictionary operations with a worst-case cost of O(log n) per operation, at the cost of storing 
one extra bit (the "color" of the node) at each node. AVL trees [1] are another well-known 
example of height-balanced trees. 

Other schemes are weight-balanced in that the size of subtrees causes restructuring. 
By ensuring that the weights of siblings are approximately equal, an overall bound on the 
height of the tree is enforced. Nievergelt and Reingold [53] introduce such trees and present 
algorithms for implementing the basic dictionary operations in O(log n) worst-case time. 

The hrst published data structure that does not store any extra information at each node 
are Splay trees due to Sleator and Tarjan [64]. They achieve O(log n) amortized complexity 
per operation. However, splay trees do not guarantee a logarithmic worst-case bound on 
the cost of a SEARCH, and require restructuring even during searches (unlike scapegoat 
trees, which do have a logarithmic worst-case cost of a SEARCH and do not restructure the 
tree during searches). Splay trees do have other desirable properties that make them of 
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considerable practical and theoretical interest, however, such as their near-optimality when 
handling an arbitrary sequence of operations. 

Our algorithm modifies the weight-balanced method of Varghese [20, Problem 18-3], who 
presents an algorithm for maintaining weight-balanced trees with amortized cost O(log n) 
per operation. Our scheme combines the notions of height-balance and weight-balance to 
achieve an effective algorithm, without storing either height information or weight informa- 
tion at any node. It is most similar to Andersson's GB (c) trees [3]. His hrst publication [2] 
has shortly preceded our independent discovery. 

Both GB (c) trees and scapegoat trees use total rebuilding of subtrees to enforce an 
upper bound on the depth of the tree, and achieve the same asymptotic performance for 
the dictionary operations. Both schemes require no balancing information to be kept at the 
nodes. Andersson's restructuring is triggered by a height condition. We have rediscovered 
his restructuring scheme, yet we also present a more general weight-based condition. The 
maintenance algorithm for scapegoat trees, like that for GB (c) trees, occasionally rebuilds 
the whole tree to preserve the depth guarantee in the face of deletions. The condition 
used in scapegoat trees to trigger restructuring of the whole tree is advantageous in that it 
requires less frequent restructuring to enforce the same depth bound. 

Yet another advantage of our scheme is demonstrated by the following scenario based on 
a true story. Consider a company, ComputerPeak Inc., that uses a plain binary search trees' 
algorithm for its small data bases. One day a decision is made to upgrade the unbalanced 
trees' approach. Using scapegoat trees the upgrade can be carried out without changing 
the format of the data, and without throwing out old code. The old code can be used as a 
subroutine of the novel scapegoat structure. Although this scenario may not be very likely, 
the same property of our data structure can prove useful in their initial coding. It suggests 
a natural break-up of the code's development into two phases, the hrst of which produces 
code that supports all of the system's features except performance. 

We show scapegoat balancing can be used for a variety of tree-based data structures : 
Bentley's [12] k — d trees, Leuker's [40] trees for orthogonal queries. Finkel and Bentley's [25] 
quad trees. For all of these a method of balancing that does not resort to extraneous 
balancing information at the nodes was not previously known. 
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We include the first experimental study of a tree-based data structure that maintains 
balance by partial rebuilding without storing auxiliary information at the nodes. Our 
experimental results suggest how scapegoat trees can be tuned for optimal performance in 
practice. We also compare them to other tree-based solutions of the dictionary problem. 
Scapegoat trees show performance superior to splay trees and for some inputs even to 
the more conventional red-black trees which store auxiliary balancing information at every 
node. 

Section 3.2 introduces the basic scapegoat data structure, and some notations. Sec- 
tion 3.4 describes the algorithm for maintaining scapegoat trees and outlines the proof of 
their features. Section 3.5 proves the complexity claims. Section 3.6 describes an algorithm 
for rebuilding a binary search tree in linear time and logarithmic space. In Section 3.7 we 
show how our techniques can be used in three known multi-key tree-based data structures, 
and state weak conditions that suffice to allow its application to other data structures. In 
Section 3.7.2 we show how an existing binary search trees' data base can be upgraded to 
a scapegoat trees' data base without modifying data format while reusing existing code. 
Section 3.9 includes a detailed comparison of Andersson's GB (c) trees to scapegoat trees. 
Section 3.10 reports the results of experimental evaluation of scapegoat trees. We compare 
a few variants of the scapegoat algorithm to each other and also compare it to other algo- 
rithms for maintenance of binary search trees. Finally, Section 3.11 concludes with some 
discussion and open problems. 

3.2 Notations 

In this section we describe the data structure of a scapegoat tree. Basically, a scapegoat 
tree consists of an ordinary binary search tree, with two extra values stored at the root. 
Each node x of a scapegoat tree maintains the following attributes: 

• A;ej/[x] - The key stored at node x. 

• left[x] - The left child of x. 

• right[x] - The right child of x. 
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We'll also use the notations: 

• size(x) - the size of the sub-tree rooted at x (i.e., the number of keys stored in this 
sub-tree including the key stored at x). 

• brother(x) - the brother of node x; the other child of x's parent or NIL. 

• h(x) and h(T) - height of a node and a tree respectively. The height of a node is the 
length of the longest path from that node to a leaf. The height of a tree is the height 
of its root. 

• d(x) - depth of node x. The depth of a node is the length (number of edges) of the 
path from the root to that node. (The root node is at depth 0.) 

Note that values actually stored as fields in a node are used with brackets, whereas 
values that are computed as functions of the node use parentheses; each node only stores 
three values: key, left, and right. Computing brother(x) requires knowledge of x's parent. 
Most importantly, size(x) is not stored at x, but can be computed in time 0(size(x)) as 
necessary. 

The tree T as a whole has the following attributes: 

• root[T] - A pointer to the root node of the tree. 

• si'zefT] - The number of nodes in the tree. This is the same as size(root[T]). In our 
complexity analyses we also denote sizefT] by n. 

• max. size [T] - The maximal value of size [T] since the last time the tree was completely 
rebuilt. If DELETE operations are not performed, then the max. size attribute is not 
necessary. 

3.3 Preliminary Discussion 

SEARCH, INSERT and DELETE operations on scapegoat trees are performed in the usual 
way for binary search trees, except that, occasionally, after an update operation (INSERT 
or DELETE) the tree is restructured to ensure that it contains no "deep" nodes. 
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A binary-tree node x is said to be a-weight-balanced, for some a, 1/2 < a < 1, if 
both 

size (left [x]) < a • size (x), and (3-1) 

size (right [x]) < a- size (x) . (3-2) 

We call a tree a- weight-balanced if, for a given value of a, 1/2 < a < 1, all the nodes 
in it are a-weight-balanced. Intuitively, a tree is a-weight-balanced if, for any subtree, the 
sizes of its left and right subtree are approximately equal. 
We denote 

h a (n) = [log ( i/ a) ra J, 

and say that a tree T is a-height-balanced if it satisfies 

h(T) < h a (n), (3.3) 

where n = sizeiT). Intuitively, a tree is a-height-balanced if its height is not greater than 
that of the heighest a-weight-balanced tree of the same size. The following standard claim 
justifies this interpretation. 

Claim 3.3.1 If T is an a-weight-balanced binary search tree, then T is a-height-balanced. 

Although scapegoat trees are not guaranteed to be a-weight-balanced at all times, they 
are loosely a-height-balanced, in that they satisfy the bound 

h(T)<h a (T) + l, (3.4) 

where h a (T) is a shorthand for h a (size[T]). 

We assume from now on that a fixed a, 1/2 < a < 1, has been chosen. For this given 
a, we call a node of depth greater than h a (T) a deep node. In our scheme the detection 
of a deep node triggers a restructuring operation. 
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3.4 Operations on Scapegoat Trees 

3.4.1 Searching a Scapegoat Tree. 

In a scapegoat tree, SEARCH operations proceed as in an ordinary binary search tree. No 
restructuring is performed. 

3.4.2 Inserting into a Scapegoat Tree. 

To insert a node into a scapegoat tree, we insert it as we would into an ordinary binary 
search tree, increment size[T] } and set max_size[T] to be the maximum of sizefT] and 
max_size[T]. Then — if the newly inserted node is deep — we rebalance the tree as follows. 

Let x be the newly inserted deep node, and in general let x 8+ i denote the parent of 
X{. We climb the tree, examining x 0} X\ } x 2} and so on, until we find a node X{ that is 
not a- weight-balanced. Since x is a leaf, size(x ) = 0. We compute size(x 1+ i) using the 
formula 

size(x 1+ i) = size(xj) + size (brother (xj)) + I (3-5) 

for j = 1, 2, . . . , z, using additional recursive searches. 

We call Xi, the ancestor of x that was found that is not a-weight-balanced, the scape- 
goat node. A scapegoat node must exist, by Claim 3.5.1 below. 

Once the scapegoat node X{ is found, we rebuild the subtree rooted at X{. To rebuild 
a subtree is to replace it with a I/2-weight-balanced subtree containing the same nodes. 
This can be done easily in time 0(size(xi)). Section 3.6 describes how this can be done in 
space O(log n) as well. 

An alternative way to find a scapegoat node. 

As can be seen in Figure 3.4.2, x might have more than one weight-unbalanced ancestor. 
Any weight-unbalanced ancestor of x may be chosen to be the scapegoat. Here we show 
that another way of finding a weight-unbalanced ancestor X{ of x is to find the deepest 
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Figure 3-1: The initial tree, T. For a = 0.57, h a {17) = h a {19>) = 5, and T is loosely a- 
height-balanced (because node 10 is at depth 6). Nodes 2, 5, 6, 12, 15 and 16 are currently 
weight-unbalanced. Inserting 8 into this tree triggers a rebuild. We chose node 6 to be the 
scapegoat node. 

ancestor of x satisfying the condition 

i > h a (size(xi)). (3-6) 

Since this ancestor will often be higher in the tree than the hrst weight-unbalanced ancestor, 
it may tend to yield more balanced trees on the average. (In our experiments this heuristic 
performed better than choosing the hrst weight-unbalanced ancestor to be the scapegoat.) 
Inequality (3.6) is satisfied when X{ = root[T] } hence this scheme will always find a scapegoat 
node. The scapegoat node found is indeed weight-unbalanced by Claim 3.5.2. 

Note that applying condition (3.6) when searching for the scapegoat in the example in 
Figure 3.4.2 indeed results in node 6 being rebuilt, since it is the hrst ancestor of node 8 
that satisfies the inequality. 

3.4.3 Deleting from a Scapegoat Tree. 

Deletions are carried out by hrst deleting the node as we would from an ordinary binary 
search tree, and decrementing size[T]. Then, if 

si'zefT] < a ■ max_size[T] (3-7) 

we rebuild the whole tree, and reset max_size[T] to size[T]. 
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3.4.4 Remarks. 

• Every time the whole tree is rebuilt max_size[T] is set to size[T]. 

• Note that h a (T) is easily computed from the information stored at the root. (Indeed, 
it could even be stored there as an extra attribute.) 

• We do not need explicit parent fields in the nodes to find the scapegoat node, since 
we are just climbing back up the path we came down to insert the new node; the 
nodes X{ on this path can be remembered on the stack. 

3.5 Correctness and Complexity 

Now we prove the algorithm just described is indeed correct and analize its complexity. 

3.5.1 Correctness. 

The following two claims prove that the algorithm is indeed correct. 

The hrst claim guarantees that a deep node has an ancestor that in not a-weight- 
balanced. 

Claim 3.5.1 If x is a node at depth greater than h a (T) then there is an a-weight-unbalanced 
ancestor of x. 

Proof: By negation according to equations (3.1) if x is a child of j/, then 

size(x) < a ■ size(y). 

By induction on the path from x to the root, size(x) < a d ( x ' ■ size[T]. Hence, the depth 
d(x) of a node x is at most log/ x , \ sizefT] establishing the claim. □ 

The following claim proves that a scapegoat node found using inequality (3.6) is weight- 
unbalanced. 



85 



Claim 3.5.2 If a binary tree T contains a node x at depth greater than h a (n), then the 
deepest ancestor X{ of x that is not a-height-balanced is not a-weight-balanced either. 

Proof: We chose X{ so that the following inequalities are satisfied. 

i > h a (size(xi)) , 

and 

i — I < h a (size(xi-i)) . 

Subtracting these two inequalities gives 

f > h a (size(xi)) — h a (size(xi-i) 

I size(xj) \ 
= log 1/a ' [ tJ ] 



sizexx; 



l-l, 



Therefore, 



size(xi_i) > a ■ size(xi). 
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3.5.2 Complexity of Searching. 

Since a scapegoat tree is loosely a-height-balanced and a is fixed, a SEARCH operation 
takes worst-case time 

O(h a (n)) = 0(logn) . 

No restructuring or rebalancing operations are performed during a SEARCH. Therefore, 
not only do scapegoat trees yield an O(log n) worst-case SEARCH time, but they should 
also be efficient in practice for SEARCH-intensive applications since no balancing overhead 
is incurred for searches. 

3.5.3 Complexity of Inserting. 

The following claim is key to the complexity analysis. 
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Claim 3.5.3 The time to find the scapegoat node X{ is 0(size(xi)). 

Proof: The dominant part of the cost of finding the scapegoat node X{ is the cost of 
computing the values size(x ) } size(xi), . . . , size(xi). Observe that with the optimized size 
calculations described in equation (3.5), each node in the subtree rooted at the scapegoat 
node Xi is visited exactly once during these computations. □ 

We now analyze the situation where no DELETE operations are done; only INSERT and 
SEARCH operations are performed. The following claims yield Theorem 3.5. f, which shows 
that a scapegoat tree is always a-height-balanced if no deletions are performed. The next 
claim asserts that rebuilding a tree does not make it deeper. 

Claim 3.5.4 If T is a 1/2-weight-balanced binary search tree, then no tree of the same 
size has a smaller height. 

Proof: Straightforward. □ 

Claim 3.5.5 If the root of T is not a-weight-balanced then its heavy subtree contains at 
least 2 nodes more than its light subtree. 

Proof: Denote by Sh and si the sizes of the heavy and the light subtrees respectively. 
The root of the tree is not a-weight-balanced, hence: 

s h > a ■ (s h + si + 1) 

This yields: 

s h > -, (sf + 1) 

1 — a 

Since a > 1/2 and Sh and si are both whole numbers, we get: 

s h > S[ + 2 . 

□ 
A tree T is complete of height h if a node cannot be added to T without making its 
height greater than h. A complete tree of height h has 2 h+1 — 1 nodes. 
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Claim 3.5.6 If T is not a-weight-balanced and T contains only one node at depth h(T) 
then rebuilding T decreases its height. 

Proof: Let x be the deepest node of T, and let T\ be the light subtree of T . Let T{ be 
the tree we get by removing x from T\ if x is a node of 7), or T\ itself if x is not a node of 
T\. By Claim 3.5.5, T{ is not a complete tree of height h(T) — 1. Therefore, Claim 3.5.4 
completes the proof. □ 

Theorem 3.5.1 If a scapegoat tree T was created from a 1/2-weight-balanced tree by a 
sequence of INSERT operations, then T is a-height-balanced. 

Proof: By induction on the number of insert operations using Claim 3.5.6. □ 

Let us now consider a sequence of n INSERT operations, beginning with an empty tree. 

We wish to show that the amortized complexity per INSERT is O(logra). 

For an overview of amortized analysis, see Cormen et al. [20]. We begin by defining a 

nonnegative potential function for the tree. Let 

A(x) = \size(left[x]) — size(right[x])\ } 

and define the potential of node x to be if A(x) < 2, and A(x) otherwise. The potential 
of a 1/2-weight-balanced node is thus 0, and the potential of a node x that is not a-weight- 
balanced is Q(size(x)). (Note that A(x) is not stored at x nor explicitly manipulated 
during any update operations; it is just an accounting fiction representing the amount of 
"prepaid work" available at node x.) The potential of the tree is the sum of the potentials 
of its nodes. 

It is easy to see that by increasing their cost by only a constant factor, the insertion 
operations that build up a scapegoat tree can pay for the increases in potential at the 
nodes. That is, whenever we pass by a node x to insert a new node as a descendant of x, 
we can pay for the increased potential in x that may be required by the resulting increase 
in A(x). 

The potential of the scapegoat node, like that of any non-a-weight-balanced node, is 
Q(size(xi)). Therefore, this potential is sufficient to pay for finding the scapegoat node 

88 



and rebuilding its subtree. (Each of these two operations has complexity Q(size(xi)).) 
Furthermore, the potential of the rebuilt subtree is 0, so the entire initial potential may be 
used up to pay for these operations. This completes the proof of the following theorem. 

Theorem 3.5.2 A scapegoat tree can handle a sequence of n INSERT and m SEARCH 
operations, beginning with a 1/2-weight-balanced tree, with O(logra) amortized cost per 
INSERT and 0(log k) worst-case time per SEARCH, where k is the size of the tree the 
SEARCH is performed on. 

3.5.4 Complexity of Deleting. 

The main claim of this section, Claim 3.5.10, states that scapegoat trees are loosely a- 
height-balanced (recall inequality (3.4)). Since we perform 0(n) operations between two 
successive rebuilds due to delete operations we can "pay" for them in the amortized sense. 
Therefore, combining Claim 3.5.10 with the preceding results completes the proof of the 
following theorem. 

Theorem 3.5.3 A scapegoat tree can handle a sequence of n INSERT and m SEARCH or 
DELETE operations, beginning with a 1/2-weight-balanced tree, with O(logra) amortized cost 
per INSERT or DELETE and 0(log k) worst-case time per SEARCH, where k is the size of 
the tree the SEARCH is performed on. 

The hrst claim generalizes Theorem 3.5.1. 

Claim 3.5.7 For any tree T let T' = lNSERT(T, x), then 

h(T') <m&x(h a (T') } h(T)) . 

Proof: If the insertion of x did not trigger a rebuild, then the depth of x is at most 
h a (T') and we are done. 

Otherwise, suppose x was initially inserted at depth d in T, where d > h a (T'), thereby 
causing a rebuild. If T already contained other nodes of depth d we are done, since a rebuild 
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does not make a tree deeper. Otherwise, the arguments in section 3.5.1 and Claim 3.5.6 
apply. □ 

Claim 3.5.8 If h a (T) does not change during a sequence of INSERT and DELETE opera- 
tions then m&x(h a (T) } h(T)) is not increased by that sequence. 

Proof: A DELETE operation can not increase max(ft a (T), h(T)). For an INSERT we have 

h(T') <m&x(h a (T') } h(T)) 

by Claim 3.5.7. Hence 

ma,x(h a (T'),h(T')) < m&x(h a (T') } h(T)) = 

m&x(h a (T) } h(T)) . 

The claim follows by induction on the number of operations in the sequence. □ 

Claim 3.5.9 For T" = lNSERT(T, x), if T is loosely a-height-balanced but is not a-height- 
balanced, and h a (T') = h a (T) + 1, then T" is a-height-balanced. 

Proof: We know that 

h(T) = h a (T) + l. 

Hence 

h(T) = h a (T). 

Combining this with Claim 3.5.7 gives 

h(T') < h a (T) , 

i.e., T" is height balanced. □ 

Now we have the tools to prove the main claim of this section. 

Claim 3.5.10 A scapegoat tree built by INSERT and DELETE operations from an empty 
tree is always loosely a-height-balanced. 
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Proof: Let oi,...,o n be a sequence of update operations that is applied to a 1/2- 
weight-balanced scapegoat tree, up until (but not including) the hrst operation, if any, that 
causes the entire tree to be rebuilt. To prove the claim it suffices to show that during this 
sequence of operations the tree is always loosely a-height-balanced. During any sequence 
of update operations that do not change h a (T) } a loosely a-height-balanced tree remains 
loosely a-height-balanced, and an a-height-balanced tree remains a-height-balanced, by 
Claim 3.5.8. Therefore, let o 8l , . . . , o 8A . be the subsequence (not necessarily successive) of 
operations that change h a (T). An INSERT operation in this subsequence leaves the tree 
a-height-balanced, by Claim 3.5.9. The usage of max_size[T] in DELETE implies that there 
are no two successive DELETE operations in this subsequence, since the entire tree would be 
rebuilt no later than the second such DELETE operation. Therefore a DELETE operation in 
this subsequence must operate on an a-height-balanced tree. Since the DELETE operation 
decreases h a (T) by just one, the result is a loosely a-height-balanced tree. The claim 
follows from applying the preceding claims in an induction on the number of operations. □ 
Proof (of Theorem 3.5.3): The proof of Theorem 3.5.1 can be easily modified to 

show that the amortized complexity of iNSERTing and DELETing is logarithmic. That is 
the potential saved at the scapegoat node can "pay" the cost of rebuilding and possibly 
searching in the amortized sense. A similar argument to that in the proof of Theorem 3.5.1 
holds for DELETE triggered rebuilding of the root. 

By Claim 3.5.10 the height of a scapegoat tree is always logarithmic in the number 
of nodes. Thus accounting for the worst-case performance of SEARCHes claimed in the 
theorem. □ 

3.6 Rebuilding in Place 

A straightforward way of rebuilding a tree is to use a stack of logarithmic size to traverse 
the tree in-order in linear time and copy its nodes to an auxiliary array. Then build the 
new 1/2-weight-balanced tree using a "divide and conquer" method. This yields 0(n) time 
and space complexity. Chang and Iyengar [31] survey a few techniques for rebuilding trees 
using logarithmic auxiliary space, and present additional algorithms. 
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The algorithms we present in this section are not included in their survey. All of the 
algorithms they present require two traversals of the tree. The non-recursive technique 
presented in Section 3.6.2 takes advantage of the fact that the input subtree is known to 
be of depth logarithmic in the size of the whole tree to perform rebuilding in a single pass. 

3.6.1 A Simple Recursive Method. 

The hrst algorithm links the elements together into a list, rather than copying them into 
an array. 

The initial tree- walk is implemented by the following procedure, FLATTEN. A call of 
the form FLATTEN(x, NIL) returns a list of the nodes in the subtree rooted at x, sorted in 
nondecreasing order. In general, a call of the form FLATTEN(x, y) takes as input a pointer x 
to the root of a subtree and a pointer y to the hrst node in a list of nodes (linked using 
their right pointer fields). The set of nodes in the subtree rooted at x and the set of nodes 
in the list headed by y are assumed to be disjoint. The procedure returns the list resulting 
from turning the subtree rooted at x into a list of nodes, linked by their right pointers, and 
appending the list headed by y to the result. 

FLATTEN(x, y) 

1 if x = NIL 

2 then return y 

3 right[x] <— FLATTEN (right[x], y) 

4 return FLATTEN(/e/t[x], x) 

The procedure runs in time proportional to the number of nodes in the subtree, and in 
space proportional to its height 

The following procedure, BUILD-TREE, builds a I/2-weight-balanced tree of n nodes 
from a list of nodes headed by node x. It is assumed that the list of nodes has length at 
least n + 1. The procedure returns the n + 1st node in the list, s, modified so that left[s] 
points to the root r of the n-node tree created. 
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Figure 3-2: The tree lNSERT(T, 8), where T is the tree of Figure 1. 

BuiLD-TREE(n,x) 

1 if n = 

2 then /e/t[x] <— NIL 

3 return x 

4 r <- BuiLD-TREE([(n- l)/2],x) 

5 5 <— BuiLD-TREE([(n - 1 ) /2j , right[r]) 

6 n'<//j£[r] <— left[s] 

7 /e/t[s] <— r 

8 return s 

A call to BuiLD-TREE(n, scapegoat) runs in time 0(n) and uses O(log n) space. 

The following procedure, Rebuild-Tree, takes as input a pointer scapegoat to the 
root of a subtree to be rebuilt, and the size n of that subtree. It returns the root of the 
rebuilt subtree. The rebuilt subtree is 1/2-weight-balanced. The procedure utilizes the 
procedures FLATTEN and BUILD-TREE defined above, and runs in time 0(n) and space 
proportional to the height of the input subtree. 

REBUILD-TREE(n, scapegoat) 

1 create a dummy node w 

2 z <— FLATTEN (scapegoat, w) 

3 BuiLD-TREE(n,z) 

4 return left[w] 

Figures 3.4.2 and 3 illustrate this process. 
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Figure 3-3: Non-recursive rebuilding in place. An intermediate state during the execution 
of a rebuilding in place of the tree lNSERT(T, 8). Node 11 is the new root of the subtree 
being rebuilt. (See T in Figure 1). 

3.6.2 A Non-Recursive Method. 

This section suggests a non-recursive method for rebuilding a tree in logarithmic space, 
that proved to be faster in our experiments than the previous version. 

We traverse the old tree in-order. Since the number of nodes in the tree is known, the 
new place of each node we encounter can be uniquely determined. Every node is "plugged 
into" the right place in the new tree upon being visited, thereby creating the new tree in 
place. 

We need to keep track of the "cutting edge" of the two tree traversals as shown in 
Figure 3.6.2. Since the depth of both trees is logarithmic, two logarithmic size stacks 
suffice for this purpose. 

The procedure Rebuild-Tree provides the same interface as the procedure with the 
same name given in sub-section 3.6.1. It calls the procedures GET-NEXT-NODE and ADD- 
NEW-NODE, which are described below. 

Our pseudo-code calls the standard stack-handling routines POP, PUSH, CREATE and 
TOP. It also uses SECOND - a routine that peeks at the second element on the stack. 
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REBUILD-TREE(n, scapegoat) 

1 insert Jype <— I-TYPE-LEFT 

2 slotsJnJastJevel <- 2^ 

3 nodes_f or JastJevel <— n — slotsJnJastJevel + 1 

4 ratzo <— nodes_f or JastJevel/ slotsJnJastJevel 

5 CRE ATE{Ruining JStack) 

6 CRE ATE{Building JStack) 

7 P\JSR~{Ruining JStack, scapegoat) 

8 while n > 

9 do n <— n — 1 

10 insert Jype <— ADD-NEW-NODE(GET-NEXT-NODE0, insert Jype) 

11 return Top{Building JStack) 

The routine GET-NEXT-NODE traverses the old tree in-order. It uses a stack - Ruining JStack 
- to store pointers to the nodes of the old subtree. The size of this stack is bound by the 
depth of the subtree being rebuilt, i.e. by h a {n) + 2, where n is the size of the subtree. 

Get-Next-Node() 

1 next_node <— TOP '{Ruining JStack) 

2 while left[next_node] ^ NIL 

3 do father_node <— nextjiode 

4 nextjiode <— left[next_node] 

5 if nextjiode = TOP '{Ruining JStack) 

6 then POP '{Ruining JStack) 

7 else left[father_node] <— NIL 

8 if right[next_node] ^ NIL 

9 then P\JSR~{Ruining JStack, right[next_node]) 
10 return nextjiode 

The routine ADD-NEW-NODE creates a perfectly balanced tree from the nodes that are 
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passed to it in-order. 

ADD-NEW-NODE accesses and modifies the global variables ratio, nodes_f or JastJevel 
and slots JnJastJev el that were set by Rebuild-Tree. It assumes that the number of 
times it will be called is compatible with the initial value of nodes_f or JastJevel . 

The parameters of ADD-NEW-NODE are next_node and insertJype. The hrst - next_node 
- is a tree-node. The nodes are assumed to be passed in-order. The second parameter - 
insert Jype - can be equal to one of three constants: I-T-LEFT - if the node is to be inserted 
as a leaf which is a left son of its parent; I-T-RIGHT - same as I-T-LEFT only for a right 
son; and I-T-PARENT - if the node is not a leaf. ADD-NEW-NODE returns the value that 
should be passed as insertJype on the next call. 

ADD-NEW-NODE uses a stack - Building JStack - the size of which is bound by lg n + 1, 
where n is the size of the subtree being rebuilt. The records stored on this stack contain four 
fields - a pointer to a tree node, height , lacks _right_son and lacks ..father . The height held is 
a positive integer that records the height of the appropriate node in the new tree relatively 
to the deepest leaf in the tree. The boolean fields lacks _right_son and lacks_father indicate 
the reason that caused us to push the record on the stack. Possible reasons are - the node 
does not have a father yet, or the node's right son was not determined yet. For every record 
on the stack at least one of lacks _right_son and lacks_father is set to TRUE. We will refer 
to these fields in the order in which they were described. Hence, {node, 7, TRUE, FALSE} 
will denote a record that points to node node, with height equal to 7, lacks _right_son set 
to TRUE and lacks_father set to FALSE. 

ADD-NEW-NODE(next_noJe, insertJype) 

1 if insertJype ^ I-T-PARENT 

2 then slots JnJastJevel <— slots JnJastJevel — 1 

3 if nodes_f or JastJevel / slots JnJastJevel < ratio 

4 then return SKIP- A-LEAF(next_node, insert Jype) 

5 else nodes_forJastJevel-^nodes_forJastJevel—l 

6 return ADD- A-hFAF(next_node,insert Jype) 

7 else return ADD-NON-LEAF(next_node) 
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SKIF-A-hEAF(next_node } insert -type) 

1 if insert -type = I-T-LEFT 

2 then skip a left leaf 

3 left[next-node] <— NIL 

4 if height[Top(Building Stack)] = 2 

5 then right[TOF(Building Stack)] <— nextjiode 

6 if -slacks _father[TOF(Building Stack)] 

7 then P 'OP (Building Stack) 

8 else lacks_right_son[TOF(Building Stack)] <— FALSE 

9 PuSft(BuildingStack, {nextjiode, 1, TRUE, FALSE}) 

10 else PuSft(BuildingStack, {nextjiode, 1, TRUE, TRUE}) 

11 return I-T-RIGHT 

12 else skip a right leaf 

13 right[TOF(Bui!ding Stack)] <— NIL 

14 if ->lacks-father[Top(BuildingStack)] 

15 then P 'OP (Building Stack) 

16 else lacks_right_son[TOF(BuildingStack)] <— FALSE 

17 return ADD-NON-LEAF(next_noJe) 
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ADD-A-hEAF(next_node } insert. type) 

1 right\next jiode] <— NIL 

2 left[next_node] <— NIL 

3 if insert -type = I-T-LEFT 

4 then PuSft(Building_Stack,{next_node,0, FALSE, TRUE}) 

5 else right[Top(Building Stack)] <— nextjiode 

6 if lacks-father[Top(Building Stack)] 

7 then lacks _rightson[Top(Building Stack)] <— FALSE 

8 else Pop(BuildingStack) 

9 return I-T-PARENT 



ADD-NON-LEAF(next_noJe) 

1 left[next_node] <— Top(BuildingStack) 

2 next-node 1 's -height <— height[Top(Building Stack)] + 1 

3 Pop(BuildingStack) 

4 if height[SECO!SiP)(Building Stack)] = next -node 1 's -height + 1 

5 then right[Top(Building Stack)] <— next-node 

6 if ->lacks -father[Top(Building Stack)] 

7 then P 'OP (Building Stack) 

8 else lacks-rightson[Top(Building Stack)] <— FALSE 

9 P\JSR~(BuildingStack } {next_node } next-node' S-height, TRUE, FALSE}) 

10 else P\JSR~(BuildingStack } {next_node } next-node' S-height, TRUE, TRUE}) 

11 if height[Top(Building Stack)] > 1 

12 then return I-T-LEFT 

13 else return I-T-RIGHT 
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3.7 More on Applications of Scapegoat Techniques 

Scapegoat balancing techniques are applicable not only to binary search trees, but also to 
other tree-based data structures. We hrst state sufficient conditions for their applicability, 
and then describe some known data structures which meet these conditions. No balanc- 
ing scheme that does not store auxiliary information at the nodes was previously known 
for the multi-key data structures to which we show scapegoat techniques can be applied. 
The discussion in the second subsection, addresses the upgrading of code that supports 
unbalanced data structures to scapegoat-balanced structures. It also suggests steps for the 
initial coding of scapegoat trees. 

3.7.1 Multi-Key Data 

The ideas underlying scapegoat trees are that of finding and rebuilding a subtree whose 
root is not weight-balanced when the tree gets too deep, and periodically rebuilding the 
root after enough DELETES occurred. This technique can be applied to other tree-like data 
structures. To allow this, it should be possible to find the scapegoat node and to rebuild 
the subtree rooted at it. The time to find the scapegoat and the rebuilding time does not 
have to be linear in the number of nodes in the subtree being rebuilt, as was the case with 
binary search trees (Theorem 3.5.3). It is also not necessary for the rebuilding algorithm 
to yield a perfectly balanced subtree. These generalizations of the main theorem, allow us 
to apply scapegoat techniques to an array of other tree-like data structures. 

A Stronger Version of the Main Theorem. 

Suppose for a class of trees, some fixed a\, a i > 1/2 and a function F, F(n) = 0(1), satisfying 
F(Cn) = 0(F(n)) for any constant C, there exists an algorithm that when given n nodes 
can in 0(nF(n)) steps build a tree containing those nodes that is a^-weight-balanced. 
We'll call such a rebuilding routine a a^-relaxed rebuilding routine. Also suppose there 
exists an algorithm that can find an ancestor of a given node that is not weight-balanced in 
0(nF(n)) time, where n is the size of the subtree rooted at the scapegoat node, provided 
such an ancestor exists. Then we can use scapegoat techniques to support dynamic updates 
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to this class with amortized logarithmic complexity. When F(n) is constant and a\, a i = 1/2, 
we have the previously handled situation of Theorem 3.5.3. 

For a fixed ct tr i gger} a tr i gger > ctb a i } an insertion of a deep node with respect to atrigger 
would trigger a rebuilding. Claim 3.5.1 guarantees that such a node has an a t „' 33er -weight- 
unbalanced ancestor. However, for any constants a, /3, 1/2 < a < f3 and for n large 
enough there exists a /3-weight-unbalanced tree of size n that can be rebuilt into a deeper 
a-weight-balanced tree. Hence, we cannot choose any a t „' 33er -weight unbalanced ancestor 
of the deep node to be the scapegoat. However, if we choose as a scapegoat an ancestor x 
of the deep node that satisfies condition (3.6): 

h(x) > h atrigger (size(x)), (3.8) 

we can prove the following theorem. 

Theorem 3.7.1 A relaxed scapegoat tree can handle a sequence ofn INSERT andm SEARCH 
or DELETE operations, beginning with a 1/2-weight-balanced tree, with an amortized cost 
of 0(F(n) log 1 i a er n) per INSERT or DELETE and O(log 1 i a er k) worst-case time 
per SEARCH, where k is the size of the tree the SEARCH is performed on. 

Proof (sketch) : The existence of an ancestor that satisfies equation (3.8) is guaranteed 
as explained in Section 3.5 (the root of the tree satisfies it). It follows from the way the 
scapegoat was chosen that rebuilding the subtree rooted at it decreases the depth of the 
rebuilt subtree, allowing us to prove a result similar to Claim 3.5.7. The other claims 
leading to Theorem 3.5.3 can also be proven for relaxed rebuilding. Hence, we can indeed 
support a tree of depth at most log x < & + 1, where k is the size of the tree, thereby 

establishing the bound on the worst-case search time. 

To prove the amortized bound on the complexity of updates we will define a potential 
function $ in an inductive manner. Let the potential of the nodes in a subtree that was 
just rebuilt and of newly inserted nodes be 0. Every time a node is traversed by an update 
operation, increase its potential by F(N), where N is the size of the subtree rooted at that 
node. For any update operation, the node whose potential is increased the most is the root. 
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Hence the total price of the update operation is bounded by 

(F(N) + 1) log 1/atrtgger N = 0(F(N) lo gl/atrtgger N) 

as F(n) = 0(1). 

If the root is a t „' 33er -weight unbalanced, then CN different update operations traversed 
it since it was inserted or last rebuilt. Now C > Co, where 



Co 



^trigger C^bal 
^^trigger C^bal 



At each one of the last Co passes the potential of the root was increased by at least 
F((l — Co)N). Hence, the total potential stored at the root is at least CoA_F((l — Co)N) = 
0(NF(N) } allowing it to pay for the rebuilding operation. 

□ 

Scapegoat k — d Trees. 

Bentley [12] introduced k — d trees. He proved average-case bounds of C(lgn) for a tree 
of size n for both updates and searches. Bentley [13] and Overmars and van Leeuwen [55] 
propose a scheme for dynamic maintenance of k — d trees that achieves a logarithmic worst- 
case bound for searches with an average-case bound of C((lg n) 2 ) for updates. Both use an 
idea similar to ours of rebuilding weight-unbalanced subtrees. Overmars and van Leeuwen 
called their structure pseudo k — d trees. 

Scapegoat k — d trees achieve logarithmic worst-case bounds for searches and a log n 
amortized bound for updates. ( The analysis of updates of Overmars and van Leeuwen [55] 
and Bentley [13] can be improved to yield amortized rather than average-case bounds.) 
However, scapegoat k — d trees do not require maintaining extra data at the nodes. Also 
we believe they might prove to be faster in practice as they do not rebuild every weight- 
unbalanced node, thereby allowing for it to become balanced by future updates. 

Applying Theorem 3.7.1 we get: 

Theorem 3.7.2 A scapegoat k — d tree can handle a sequence of n INSERT and m SEARCH 
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or DELETE operations, beginning with a 1 / '2- weight-balanced tree, with O(log n) amortized 
cost per INSERT or DELETE and 0(log k) worst-case time per SEARCH, where k is the size 
of the tree the SEARCH is performed on. 

Proof: To apply Theorem 3.7.1 we use the algorithm Bentley [12] proposes for building 
a perfectly balanced k — d tree of N nodes in 0(kN lg iV), by taking as a splitting point 
the median with respect to the splitting coordinate. Finding the scapegoat is done in a 
manner similar to that in binary search trees. □ 

Scapegoat Trees for Orthogonal Queries. 

For keys which are d dimensional vectors one may wish to specify a range for each compo- 
nent of the key and ask how many keys have all components in the desired range. Leuker [40] 
proposed an algorithm that handles range queries in 0(log d n) worst-case time where n is 
the size of the tree. Updates are handled in 0(nlog d n) amortized time. 

Leuker 's paper proves that given a list of n keys a 1/3-balanced tree may be formed in 
0(nlog min ^' d -^n) time. 

Using this in Theorem 3.7.1 proves 

Theorem 3.7.3 A scapegoat orthogonal tree can handle a sequence of n INSERT and m 
SEARCH or DELETE operations, beginning with a 1/2-weight-balanced tree, with (9(log mm ^ ' ' i 
amortized cost per INSERT or DELETE and O (log k) worst-case time per range query, where 
k is the size of the tree the range query is performed on. 

Note that our algorithm improves Leuker's amortized bounds for updates, and does not 
require storage of balancing data at the nodes of the tree. 

Scapegoat Quad Trees. 

Quad trees were introduced by Finkel and Bentley [25]. They achieve a worst-case bound of 
0(log 2 N) per search. (As in a d dimensional quad tree every node has 2 d children naively 
one could expect a 0(log 2 dN) worst-case search time.) They do not address deletion, 
and give only experimental results for insertion times. Samet [62] proposed an algorithm 
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for deletions. Overmars and van Leeuwen [55] introduced pseudo-quad trees - a dynamic 
version of quad trees. They suggest an algorithm for achieving 0((lg N) 2 ) average insertion 
and deletion times, where N is the number of insertions, while improving the worst-case 
search time to log d+1 _ 5 n + 0(1), where d is the dimension of the tree, n the size of the tree 
the search is performed on, and 8 an arbitrary constant satisfying 1 < 8 < d. 
Scapegoat quad trees can be compared to pseudo-quad trees: 

• Scapegoat trees offer worst-case search time of Clog d+1 n for any constant C, or 
following the original notations of Overmars and van Leeuwen \og d+1 _ s n for any 
positive constant 8 (note that we do not require 1 < 8). 

• The bounds on updates are improved from average-case to amortized bounds. (Though 
careful analysis of the algorithm of Overmars and van Leeuwen [55] can yield amor- 
tized bounds too.) 

• Scapegoat trees do not require maintenance of extra data at the nodes regarding the 
weight of the children of each node. This can be quite substantial in this case, as 
each node has 2 children, where d is the dimension of the tree. 

• Scapegoat trees might prove faster in practice, as they do not require the rebuilding 
of every weight-unbalanced node, thereby allowing some nodes to be balanced by 
future updates. Also more compact storage might result in greater speed. 

We call a multi-way node, x, a- weight-balanced, if the every child y of x, satisfies 
size(y) < asize(x). Weight and height balanced trees are defined in a way similar to that 
used for binary trees. 

Theorem 2.2.3 in Overmars and van Leeuwen [55] suggests how to build a l/(d + 1) 
weight balanced pseudo-quad tree in 0(nlogn) time. Finding a scapegoat in a multiway 
tree can be done by traversing a tree in a manner similar to that described for binary trees, 
starting at the deep node and going up. Plugging this into Theorem 3.7.1 proves: 

Theorem 3.7.4 A scapegoat quad tree can handle a sequence of n INSERT and m SEARCH 
or DELETE operations, beginning with a 1/2-weight-balanced tree, with 0(log n) amortized 
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cost per INSERT or DELETE and O(log d+1 _ s k) worst-case time per SEARCH, where k is the 
size of the tree the SEARCH is performed on. 

3.7.2 Upgrading Unbalanced Binary Search Trees to Scapegoat 
Trees 

To upgrade an existing data base that uses a binary search trees' data representation, a 
change to the data itself is not required. One may continue to use the existing code to 
perform searches and modification of the data base. Scapegoat trees can be implemented 
as a software layer above the existing code that uses the existing code as a subroutine. 

The scapegoat layer can maintain the two constants required to trigger rebuildings that 
result from deletions without referring to the inner state of the data structure, or the details 
of the old search trees' implementation. A rebuilding needs to be carried out following a 
deep insertion. Deep insertions can possibly be diagnosed by measuring insertion time. 
More realistically, the number of calls by the old search trees' layer to the layer under it 
can be counted. This number reflects the number number of nodes in the data structure 
that are accessed. The insertion of a deep node will cause the number of such calls to 
surpass a prespecihed threshold. 

Rebuildings as well as look-ups of weight unbalanced ancestors are carried out by a 
subroutine of the scapegoat layer. They modify the data only. Provided the data format 
is maintained, the old code shall work correctly with the rebuilt tree. 

At the time of switch from the old code to the new code a single rebuilding of the whole 
tree is sufficient by Theorem 3.5.3. Thus the complexity of a switch is linear in the size of 
the data base. 

The cost of "plugging in" scapegoat balancing can be amortized over the hrst Q(size[T]) 
update operations, ovoiding the rebuilding of the the whole tree required above. The 
scapegoat node can always be chosen at depth < h a (n). Indeed, any node at depth > h a (n) 
must have a weight unbalanced ancestor at depth < h a (n) by Claim 3.5.1. If we always 
choose the scapegoat at depth < h a (n) then we can prove: 

Theorem 3.7.5 The amortized complexity of Q(size[T]) update operations that use the 
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scapegoat balancing scheme is 0(lg size[T]) starting with an arbitrary binary tree, for some 
scapegoat selection schemes. 

Proof: Notice that potential has to be maintained only at nodes that can be rebuilt. If 
we choose the scapegoat at depth < h a (n) } then we need to maintain potential at nodes of 
that depth only. Since 

} y \size[left[x]] — size[right[x]]\ < sizefT] 

{x:d(x) = k\ 

the total potential that needs to be stored at an arbitrary binary tree to subsequently 
support scapegoat-balanced updates is 0(nh a (n)) = 0(nlgn). Amortizing this over 0(n) 
operations gives the desired bound. □ 

Similarly, scapegoat balancing can be added onto the various tree-based schemes dis- 
cussed in Section 3.7. This feature can also be used to support two-staged development of 
scapegoat code. The unbalanced structure produced in the hrst stage will then provide all 
of the complete system's features except performance. 

3.8 Reducing Delete Incurred Restructuring 

By rebuilding the whole tree whenever triggering condition (3.7) 

si'zefT] > amax_size[T] 
is satisfied the tree is guaranteed to stay loosely a-height-balanced (Theorem 3.5.3). That 



is 



h(T)<h a (T) + l. (3.9) 

We can reduce the frequency of DELETE induced restructuring without violating the loga- 
rithmic depth of scapegoat trees. Call a binary tree L-loosely a-height-balanced if 

h(T) < h a (T) + L. (3.10) 
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In this section we show that if DELETE induced restructuring is triggered by the satisfaction 
of condition 



si'zefT] > a max_size[T] (3-H) 



the tree can be kept L-loosely a-height-balanced. 

Claim 3.8.1 A scapegoat tree can handle a sequence of INSERT and DELETE operations 
with DELETE induced whole-tree restructuring being triggered by the satisfactions of 

size[T] > a max_size[T] 

while remaining L-loosely a-height-balanced after each operation. 

Proof: The proof is similar to that of Claim 3.5.10. Restricting our attention to the sub- 
sequence of operations o 8l , . . . , Oi k that modify h a (T) } we observe that by a generalization 
of Claim 3.5.9 an INSERT operation in this subsequence into an unbalanced tree reduces 
the looseness of the tree's balance by 1. Thus an INSERT operation in this subsequence 
that increases max_size[T] leaves the tree a-height-balanced. The degree of looseness of 
a tree is therefore upper bounded by the difference between the number of DELETE and 
INSERT operations in this subsequence since the last increase of max_size[T]. Condition 
(3.11) guarantees that this quantity does not exceed L. □ 

Comment: The looser triggering condition for DELETE induced restructuring specified 
by (3.11) applies to multi-key scapegoat trees as well. 

3.9 Comparison to Andersson's Work 

We arrived at our result unaware of Andersson's publication [2] that has preceded our 
discovery by about a year. Even in light of his precedence scapegoat trees contribute to 
the theoretical understanding of the family of data structures that use partial rebuilding 
to enforce a bound on the tree's depth [3, 13, 55, 40, 25]. 

The hrst part of his thesis [3] culminates with the presentation of two data structures he 
calls GB(c) trees and GB (c) trees - General Balanced trees. These he terms "superclasses" 
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containing all other classes of balanced trees. They satisfy the simplest possible criterion 
that guarantees logarithmic time searching - 0(\gn) height. Most other tree based data 
structures, like AVL trees [1], Red-Black trees [9, 29], BB(a) trees [53], aBB trees [54] 
and Andersson's BH(c) trees impose a balance condition that makes some of the trees of 
height 0(\gn) not members of the class of allowed trees. Andersson's GB(c) trees and 
GB (c) trees are the hrst known scheme that legalizes all trees that can be searched in 
logarithmic time. The class of GB (c) trees possesses the extra merit of storing no extra 
information at the nodes to support balancing. (Andersson's c is comparable to a by the 
equation c = — (lga) -1 .) 

Scapegoat trees likewise are a "superclass" of trees that does not maintain any balancing 
information at the nodes identical to GB (c). A comparison of the main theorems for 
scapegoat trees and GB (c) shows the greater generality of our theoretical analysis. 

In particular Theorem 3.5.1 about rebuilding following an insertion is stronger than 
Andersson's comparable claim. After a deep node is inserted into a GB (c) tree the path 
up to the root is retraced until the hrst node x satisfying 

h(x) > \clgsize(x)~\ 

is encountered and rebuilt. He chooses the scapegoat using what we call the "alternative" 
method (equation (3.6)). Theorem 3.5.1 asserts that any weight unbalanced node on the 
path from the deep node to the root may be rebuilt to restore the balance. According to 
Claim 3.5.2 any node chosen by the "alternative" method is weight-unbalanced. Hence this 
method is only a special case of the more general analysis in Theorem 3.5.1. 

As for deletion, Andersson proves that to handle the imbalances resulting from deletions 
it is sufficient to rebuild the whole tree whenever 

d(T) > *ysize[T], 

where d(T) is the number of deletions that were performed since the last rebuilding of the 
whole tree. He proves that GB (c) trees compromise perfect height balance for clg(l + 7)- 
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loose height balance to accommodate deletions. That is nodes might be clg(l + 7) levels 
deeper than the desired maximal depth of h 2 -i/c(size[T]). 

We prove that to maintain a Z-loosely height-balanced scapegoat tree it is sufficient to 
rebuild the whole tree whenever (equation (3.11) ): 



si'zefT] > a max_size[T] 



or 

max _size[T\ > a~ size[T]. 

Solving 2 _1 / c = a, clg(l + 7) = L we get 



7 = a — 1. 



Since 

d(T) > max_size[T] — size[T] 

We conclude that our rebuilding criterion is theoretically stronger than Andersson's compa- 
rable criterion. To maintain the same balance condition we require less frequent rebuilding 
of the whole tree. Compare the number of DELETE-induced rebuildings of the whole tree 
in a scapegoat tree with parameter a to the number of such rebuildings for a GB (c) with 
c = — lg~ a. 

Any sequence that causes the satisfaction of condition 

max_size[T] > a~ size[T] (3.12) 

must include at least jsize[T] = (a~ L — l)size[T] deletions. Thus any sequence of operations 
that causes a DELETE-induced restructuring of a scapegoat tree will also trigger at least 
one restructuring of the comparable GB (c) tree. The opposite is not true. 

For example consider an arbitrarily long sequence of alternating INSERTS and DELETES. 
For this input sequence a scapegoat tree does not get totally rebuilt even once, while a 
GB (c) tree gets rebuilt arbitrarily many times. 
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To sum up a scapegoat trees require less total rebuilding to assure the same balance 
criterion as GB (c) trees. 

The usage of these schemes for multiway trees as well as their use for upgrading existing 
code are novel. Sleator and Tarjan [64] as well as Andersson [3] do not report experimental 
measurements of their suggested structures' performance. In the next section we present 
a practical study of tree-based dictionary solutions that do not require storage of balance 
enforcing information at the nodes. 

3.10 Experimental Results 

Our experiments address the efficient practical implementation of scapegoat trees and com- 
pare them to other known binary search trees' balancing schemes. 

3.10.1 Optimizing Scapegoat Trees 

The non-recursive method of rebuilding subtrees described in section 3.6.2 proved to work 
faster than the method described in section 3.6.1 by 25% - 30%. In section 3.4 we described 
two ways to choose the scapegoat. Our experiments suggest that checking for condition (3.6) 
yields a better overall performance. 

In our experiments we used a variant of the non-recursive rebuilding algorithm described 
by the pseudo-code in section 3.6.2 which inserts all the nodes at the deepest level of the 
newly-built subtree at the leftmost possible positions, instead of spreading them evenly. 
This simplified the code somewhat and yielded a 6% - 9% percent speedup over the version 
described by the pseudo-code. Stout and Warren [68] call these route balanced trees. This 
issue is discussed further in Section 3.11. 

It is natural to expect that the optimal value for parameter a should depend on the 
ratio between the number of searches and the number of modifications in a given sequence 
of requests to the scapegoat tree. The bigger the ratio of searches the more justified it is 
to reduce the value of a thereby enforcing a shallow tree even at the cost of more frequent 
rebuilding. 
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Figure 3-4: The value of a for which scapegoat trees performed best as a function of N 
and r. 

In table 3.10.1 we found experimentally the optimal value for a for different values of 
r and iV, In some practical applications, both r and N or at least one of them might be 
predictable in advance at the time of implementation. In such cases we suggest using the 
results in table 3.10.1 to tune a. 



3.10.2 Scapegoat Trees vs. Other Schemes 

We compared scapegoat trees to two other schemes for maintaining binary search trees - 
red-black trees and splay trees. We also compare the performance of scapegoat trees for 
different values of a. We compare the performance for each one of the three operations 
INSERT, DELETE, and SEARCH separately We consider two types of workloads - uni- 
formly distributed inputs and sorted inputs. The results are summarized in Tables 3.10.2 
and 3.10.2. The tables list average time in seconds per 129>K (131,072) operations. 

To compare the performance for uniformly distributed inputs, we inserted the nodes 
into a tree in a random order, then searched for randomly chosen nodes in the tree, and 
finally deleted all of the nodes in random order. We tried trees of three sizes - IK, 8K 
and 6A K . The results appear in Table 3.10.2. 

Table 3.10.2 summarizes the results of the comparison for sorted sequences. Here too 
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Figure 3-5: Results of comparative experiments for uniformly distributed inputs. Execution 
time in seconds per 128 K (131, 072) operations for splay trees, red-black trees and scapegoat 
trees with a varying between 0.55 - 0.75 for tree sizes of IK, 8K and 64 K. 

we tried three tree sizes - IK, 8K and 64 K. First we inserted the nodes into a tree in 
increasing order of keys, then we searched for all of the keys that were inserted in increasing 
order, and finally we deleted all of the nodes in increasing order of keys. 

For uniformly distributed sequences our experiments show that one can choose an a 
so that scapegoat trees outperform red-black trees and splay trees on all three operations. 
However, for the insertion of sorted sequences scapegoat trees are noticeably slower than 
the other two data structures. Hence, in practical applications, it would be advisable to use 
scapegoat trees when the inserted keys are expected to be roughly randomly distributed, 
or when the application is search intensive. 

For the splay trees we used top-down splaying as suggested by Sleator and Tarjan [64]. 
The implementation of red- black trees follows Chapter 14 in Cormen, Leiserson and Rivest [20] . 

3.11 Discussion and Conclusions 



Stout and Warren [68] present an algorithm which takes an arbitrary binary search tree 
and rebalances it to form what they call a route balanced tree using linear time and only 
constant space. This improves upon the logarithmic space required to output a perfectly 
balanced tree. A route balanced tree is one containing exactly 2 d nodes at level d for 
1 < 9 < \}g n \ i with no limitation on the position of trees at the deepest level. Can the 
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Figure 3-6: Results of comparative experiments for monotone inputs. Execution time in 
seconds per 128 K (131,072) operations for splay trees, red-black trees and scapegoat trees 
with a varying between 0.55 - 0.75 for tree sizes of IK, 8K and 64 K. 

performance of scapegoat trees be achieved by a structure resorting to route rebalancing 
rather than perfect rebalancing of subtrees? 

We also leave as an open problem the average-case analysis of scapegoat trees (say, 
assuming that all permutations of the input keys are equally likely). 

Section 3.4.2 proposed a few ways in which the scapegoat node can be chosen. Which one 
is superior remains an open question that may be resolved theoretically or experimentally. 

To summarize: scapegoat trees are the hrst "unencumbered" tree structure (i.e., having 
no extra storage per tree node) that achieves a worst-case SEARCH time of O(logra), with 
reasonable amortized update costs. 
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